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Nuc le ic acid molecules specific for bacterial 
antigens and uses thereof. 



TECHNICAL FIELD 

5 The invention relates to novel nucleotide sequences 

located in a gene cluster which controls the synthesis of 
a bacterial polysaccharide antigen, especially an O 
antigen, and the use of those nucleotide sequences for the 
detection of bacteria which express particular 
10 polysaccharide antigens (particularly O antigens) and for 
the identification of the polysaccharide antigens 
(particularly O antigens) of those bacteria. 



BACKGROUND ART 

15 Enteropathogenic EL. coli strains are well known 

causes of diarrhoea and haemorrhagic colitis in humans and 
can lead to potentially life threatening sequelae 
including haemolytic uremic syndrome and thrombotic 
thrombocytopaenic purpura. Some of these strains are 

2 0 commonly found in livestock and infection in humans is 

usually a consequence of consumption of contaminated meat 
or dairy products which have been improperly processed. 
The O specific polysaccharide component (the "O antigen") 
of lipopolysaccharide is known to be a major virulence 
25 factor of enteropathogenic EL. coli strains. 

The EL. coli O antigen is highly polymorphic and 166 
different forms of the antigen have been defined; Ewing, 
W. H. [in Edwards and Ewings "Identification of the 
Enterobacteriacea" Elsevier. Amsterdam (1986)] discusses 

3 0 128 different O antigens while Lior H. (1994) extends the 

number to 166 tin "Classification of Escherichia coli In 
Escherichia coli in domestic animals and humans pp31-72. 
Edited by c.L. Gyles CAB International] . The genus 
Salmonella enterica has 46 known O antigen types [Popoff 
35 M.Y. et al (1992) " Antigenic formulas of the Salmonella 
enterica serovars" 6th revision WHO Collaborating Centre 
for Reference and Research on Salmonella enterica , Institut 
Pasteur Paris France] . 
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An important step in determining the biosynthesis of 
0 antigens -and therefore the mechanism of the polymorphism 
has been to characterise the gene clusters controlling 0 
antigen biosynthesis. The genes specific for the 
5 synthesis of the O antigen are generally located in a gene 
cluster at map position 45 minutes on the chromosome of E. 
coli K-12 [Bachmann, B. J. 1990 "Linkage map of 
Escherichia coli K-12". Microbiol. Rev. 54: 130-197], and 
at the corresponding position in enterica LT2 
10 [Sanderson et al (1995) "Genetic map of Salmonella 

enterica typhimurium" , Edition VIII Microbiol. Rev. 59: 
241-303] . In both cases the O antigen gene cluster is 
close to the gnd gene as is the case in other strains of 
1L- coli and SL. enterica [Reeves P.R. {1994) "Biosynthesis 
15 and assemby of lipopolysaccharide, 281-314. in A. 

Neuberger and L.L.M. van Deenen (eds) "Bacterial cell 
wall, new comprehensive biochemistry " vol 27 Elsevier 
Science Publishers], These genes encode enzymes for the 
synthesis of nucleotide diphosphate sugars and for 
20 assembly of the sugars into oligosaccharide units and in 
general for polymerisation to O antigen. 

The JsL. coli O antigen gene clusters for a wide range 
of 1L_ coli 0 antigens have been cloned but the 07, 09, 016 
and 0111 O antigens have been, studied in more detail with 
25 only 09 and 016 having been fully characterised with 

regard to nucleotide sequence to date [Kido N. , Torgov 
V.I., Sugiyama T. , Uchlya K., Sugihara H., Komatsu T. , 
Kato N. & Jann K. (1995) "Expression of the 09 
polysaccharide of Escherichia coli-. sequencing of the E. 
30 coli 09 rfb gene cluster, characterisation of mannosyl 
transferases, and evidence for an ATP-binding cassette 
transport system" J. of Bacteriol. 177 2178-2187; 
Stevenson G. , Neal B. , Liu D. , Hobbs M. , Packer N.H. , 
Batley M. , Redmond J.W. , Lindquist L. & Reeves PR (1994) 
35 "Structure of the 0 antigen of B. coli K12 and the 

sequence of its rfb gene cluster" J. of Bacteriol. 176 
4144-4156; Jayaratne, P. et al. (1991) "Cloning and 
analysis of duplicated rfbM and rfbK genes involved in the 
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formation of GDP-mannose in Escherichia coli O9-.K30 and 
participation of rfb genes in the synthesis of the group 1 
K30 capsular polysaccharide" J. Bacterid. 176: 3126-3139; 
Valvano, M. A. and Crosa, J. H. (1989)" Molecular cloning 
5 and expression in Escherichia coli K-12 of chromosomal 
genes determining the 07 lipopolysaccharide antigen of a 
human invasive strain of E.coli 07:K1". Inf and Immun. 
57:937-943; Marolda C. L. And Valvano, M. A. (1993). 
"Identification, expression, and DNA sequence of the GDP- 
10 mannose biosynthesis genes encoded by the 07 rfb gene 
cluster of strain VW187 {Eschericia coli Ol-.Kl)". J. 
Bacteriol. 175:148-158.] 

Bastin D.A. , et al . 1991 ["Molecular cloning and 
expression in Escherichia coli K-12 of the rfb gene 
15 cluster determining the O antigen of an E.coli 0111 

strain". Mol. Microbiol. 5:9 2223-2231] and Bastin D.A. 
and Reeves, P.R. [(1995)" Sequence and analysis of the O 
antigen gene (rfb) cluster of Escherichia coli 0111". Gene 
164: 17-23] isolated chromosomal DNA encoding the JL_ coli 
20 0111 rfb region and characterised a 6962 bp fragment of E . 
coli 0111 rfb. Six open reading frames (orfs) were 
identified in the 6962 bp partial fragment and the 
alignment of the sequences of these orfs revealed homology 
with genes of the GDP-mannose pathway, rfbK and rfbM, and 
25 other rfb and cps genes. 

The nucleotide sequences of the loci which control 
expression of Salmonella enter ica B, A, Dl, D2, D3, CI, C2 
and E O antigens have been characterised [Brown, P. K. , L. 
K. Romana and P. R. Reeves (1991) "Cloning of the rfb gene 
3 0 cluster of a group C2 Salmonella enter ica : comparison with 
the rfb regions of groups B and D Mol. Microbiol. 5:1873- 
1881; Jiang, X.-M., B. Neal, F. Santiago, S. J. Lee, L . K. 
Romana, and P. R. Reeves (1991) "Structure and sequence 
of the rfb (0 antigen) gene cluster of Salmonella enterica 
35 serovar typhimurium (LT2 ) " . Mol. Microbiol. 5:692-713; 
Lee, S. J., L. K. Romana, and P. R. Reeves (1992) 
"Sequences and structural analysis of the rfb (O 
antigen) gene cluster from a group CI Salmonella enterica 
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enterica strain" J . Gen. Microbiol. 138: 1843-1855; Lui, 
D . , N . K. Verma, L. K. Romana, and P. R. Reeves (1991) 
"Relationship among the rfb regions of Salmonella enterica 
serovars A, B and D" J. Bacteriol. 173: 4814-4819; Verma, 
5 N. K. , and P. Reeves (1989) w Identification and sequence 
of rfJbS and rfjbE, which determine the antigenic 
specificity of group A and group D Salmonella enter ica e" 
J. Bacteriol. 171: 5694-5701; Wang, L. , L. K. Romana, and 
P. R. Reeves (1992) "Molecular analysis of a Salmonella 
10 enterica enterica group El rfb gene cluster: 0 antigen and 
the genetic basis of the major polymorphism" Genetics 
130: 429-443; Wyk, P., and P. Reeves (1989). 
"Identification and sequence of the gene for abequose 
synthase, which confers antigenic specificity on group B 
15 Salmonella enterica e: homology with galactose epimerase" 

J. Bacteriol. 171: 5687-5693,; Xiang, S. H. , M. Hobbs, and 
P. R. Reeves. 1994 Molecular analysis of the rfb gene 
luster of a group D2 Salmonella enterica strain: evidence 
fox its origin from an insertion sequence -mediated 
20 recombination event between group E and Dl strains. J. 
Bacteriol. 176: 4357 -4365; Curd, H., D. Liu and P. R. 
Reeves, 1998. Relationships among the 0 antigen Salmonella 
enterica groups B, Dl, D2, and D3 . J. Bacteriol. 180: 
1002-1007. ] . 

25 Of the closely related Shigella (which really can be 

considered to be part of E^ coli ) S. dysenteriae and S. 
f lexnerj 0 antigens have been fully sequenced and are next 
to gnd. [Klena JD & Schnaitman CA (1993) "Function of the 
r£b gene cluster and the rfe gene in the synthesis of 0 

30 antigen by Shigella dysenteriae 1" Mol. Microbiol. 9 393- 
402; Morona R. , Mavris M. , Fallarino A. & Manning P. 
(1994) "Characterisation of the rfc region of Shigella 
flexneri" J. Bacteriol 176: 733-747] 

Inasmuch as the O antigen of enteropathogenic coli 

35 strains and the 0 antigen of Salmonella enterica strains 
are major virulence factors and are highly polymorphic, 
there is a real need to develop highly specific, 
sensitive, rapid and inexpensive diagnostic assays to 



WO 98/50531 



PCT/AU98/00315 



detect Ej. coli and assays to detect S^. enter ica . There is 
also a rea-l need to develop diagnostic assays to identify 
the O antigens of coli strains and assays to identify 
the 0 antigens of SL. enterica strains. With regard to the 
5 detection of EL. coli these needs extend beyond EHEC 

(enteropathogenic haemorrhagic E^ coli) strains but this 
is the area of greatest need. There is interest in 
diagnostics for ETEC (enterotoxigenic IU coli) etc in E. 
coli . 

10 The first diagnostic systems employed in this field 

used large panels of antisera raised against IL_ coli O 
antigen expressing strains or £L. enterica O antigen 
expressing strains. This technology has inherent 
difficult5.es associated with the preparation, storage and 

15 usage of the reagents, as well as the time required to 
achieve a meaningful diagnostic result. 

Nucleotide sequences derived from the O antigen gene 
clusters of S_s_ enterica strains have been used to 
determine £3^. enterica O antigens in a PCR assay [Luk, 

20 J.M.C. et al. (1993) "Selective amplification of abequose 
and paratose synthase genes (rfi>) by polymerase chain 
reaction for identification of iL. enterica major serogoups 
(A, B, C2, andD)", J. Clin. Microbiol. 31:2118-2123 ]. 
The prior complete nucleotide sequence characterisation of 

25 the entire rfb locus of serovars Typhimurium, Paratyphi A, 
Typhi, Muenchen, and Anatum; representing groups B, A, Dl, 
C2 and El respectively enabled iAik et al. to select 
oligonucleotide primers specific for those serogroups. 
Thus the approach of Luk et a.1. was based on aligning 

30 known nucleotide sequences corresponding to CDP-abequose 
and CDP-paratose synthesis genes within the 0 antigen 
regions of £. enterica serogroups El, Dl, A, B and C2 and 
exploiting the observed nucleotide sequence differences in 
order to identify serotype-specif ic oligonucleotides. 

35 In an attempt to determine the O antigen serotype of 

a Shiga-like toxin producing IL. coli strain, Paton, A. W. , 
et al. 1996 ["Molecular microbiological investigation of 
an outbreak of Hemolytic-Uremic Syndrome caused by dry 
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fermented sausage contaminated with Shiga- like toxin 
producing -Escherichia col-i". J. Clin. Microbiol. 34: 1622- 
1627], used oligonucleotides derived from the wbdl {orf6) 
region, which were believed to be specific to the coli 
5 Olll antigen and which were derived from EL. coli 0111 

sequence, in a PCR diagnostic assay. Unpublished reports 
indicate that the approach of Paton et al. is deficient in 
that the nucleotide sequences derived from wbdl may not 
specifically identify the 0111 antigen and in fact lead to 
10 detection of false positive results. Paton et al . 

disclose the detection of 5 Olll antigen isolates by PCR 
when in fact from only 3 of those isolates did they detect 
bacteria which reacted with Olll specific antiserum. 



15 DESCRIPTION OF THE INVENTION 

Whilst not wanting to be held to a particular 
hypothesis, the present inventors now believe that the 
reported false positives found with the Paton et al. 
method are due to the fact that the nucleic acid molecules 
20 employed by Paton et al . were derived from genes which 

have a putative function as a sugar pathway gene, [Bastin 
D.A. and Reeves, P.R. (1995) Sequence and analysis of the 
O antigen gene(r-fib) cluster of Escherichia coli Olll. Gene 
164: 17-23] which they now believe to lack the necessary 
25 nucleotide sequence specificity to identify the E^. coli O 
antigen. The inventors now believe that many of the 
nucleic acid molecules derived from sugar pathway genes 
expressed in S^ enterica or other enterobacteria are also 
likely to lack the necessary nucleotide sequence 
30 specificity to identify specific 0 antigens or specific 
serotypes . 

In this regard it is important to note that the genes 
for the synthesis of a polysaccharide antigen include 
those related to the synthesis of the sugars present in 
35 the antigen (sugar pathway genes) and those related to the 
manipulation of those sugars to form the polysaccharide. 
The present invention is predominantly concerned with the 
latter group of genes, particularly the assembly and 
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transport genes such as transferase, polymerase and 
flippase genes. 

The present inventors have surprisingly found that 
the use of nucleic acid molecules derived from particular 
assembly and transport genes, particularly transferase, 
wzx and wzy genes, within 0 antigen gene clusters can 
improve the specificity of the detection and 
identification of O antigens. The present inventors 
believe that the invention is not necessarily limited to 
the detection of the particular O antigens which are 
encoded by the nucleic acid molecules exemplified herein, 
but has broad application for the detection of bacteria 
which express an O antigen and the identification of O 
antigens in general . Further because of the similarities 
between the gene clusters involved in the synthesis of O 
antigens and other polymorphic polysaccharide antigens, 
such as bacterial capsular antigens, the inventors believe 
that the methods and molecules of the present invention 
are also applicable to these other polysaccharide 
antigens . 

Accordingly, in one aspect the present invention 
relates to the identification of nucleic acid molecules 
which are useful for the detection and identification of 
specific bacterial polysaccharide antigens. 

5 The invention provides a nucleic acid molecule 

derived from: a gene encoding a transferase; or a gene 
encoding an enzyme for the transport or processing of a 
polysaccharide or oligosaccharide unit, including a wzx 
gene, wzy gene, or a gene with a similar function; the 

0 gene being involved in the synthesis of a particular 
bacterial polysaccharide antigen, 

wherein the sequence of the nucleic acid molecule is 
specific to the particular bacterial polysaccharide 
antigen . 

5 Polysaccharide antigens, such as capsular antigens of 

coli (Type I and Type II) , the Virulence capsule of S . 
enterica sv- Typhi and the capsules of species such as 
Streptococcus pneumoniae and Staphylococcus albus are 
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encoded by genes which include nucleotide sugar pathway 
genes, sugar transferase genes and genes for the transport 
and processing of the polysaccharide or oligosaccharide 
unit . In some cases these are wzx or wzy but in other 
cases they are quite different because a different 
processing pathway is used- Examples of other gene 
clusters include the gene clusters for an extracellular 
polysaccharide of Streptococcus thermophilus , an 
exopolysaccharide of Rhizobium melilotti and the K2 
capsule of Klebsiella pneumoniae . These all have genes 
which by experimental analysis, comparison of nucleotide 
sequence or predicted protein structure, can be seen to 
include nucleotide sugar pathway genes, sugar transferase 
genes and genes for oligosaccharide or polysaccharide 
processing. 

In the case of the EL. coli K-12 colanic acid capsule 
gene cluster [Stevenson et al (1996) "Organization of the 
Escherichia coli K-12 gene cluster responsible for 
production of the extracellular polysaccharide colanic 
acid". J". Bacteriol 178: 4885-4893] genes from the three 
classes were identified either provisionally or 
definitively. Colanic acid capsule is classified with the 
Type I capsule of EL. coli . 

The present inventors believe that, in general, 
transferase genes and genes for oligosaccharide processing 
will be more specific for a given capsule than the genes 
coding for the nucleotide sugar synthetic pathways as most 
sugars present in such capsules occur in the capsules of 
different serotypes. Thus the nucleotide sugar synthesis 
pathway genes could now be predicted to be common to more 
than one capsule type. 

As elaborated below the present inventors recognise 
that there may be polysaccharide antigen gene clusters 
which share transferase genes and/ or genes for 
oligosaccharide or polysaccharide processing so that 
completely random selection of nucleotide sequences from 
within these genes may still lead to cross -react ion; an 
example with respect to capsular antigens is provided by 
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the coli type II capsules for which only transferase 
genes are -sufficiently specific. However, the present 
inventors in light of their current results nonetheless 
consider the transferase genes or genes controlling 
5 oligosaccharide or polysaccharide processing to be 

superior targets for nucleotide sequence selection for the 
specific detection and characterisation of polysaccharide 
antigen types . Thus where there is similarity between 
particular genes, selection of nucleotide sequences from 
10 within other transferase genes or genes for 

oligosaccharide or polysaccharide processing from within 
the relevant gene cluster will still provide specificity, 
or alternatively the use of combinations of nucleotide 
sequences will provide the desired specificity- The 
15 combinations of nucleotide sequences may include 

nucleotide sequences derived from pathway genes together 
with nucleotide seq^fene%a;^^ived from transferase, wzx or 
wzy genes. '"' '' " 

Thus the invention also provides a panel of nucleic 
20 acid molecules wherein the" nucleic acid molecules are 

derived from a combination of genes encoding transferases 
and/or enzymes for the transport or processing of a 
polysaccharide or oligosaccharide unit including wzx or 
wzy genes; wherein the combination of genes is specific to 
25 the synthesis of a particular bacterial polysaccharide 

antigen and wherein the panel of nucleic acid molecules is 
specific to a bacterial polysaccharide antigen. In 
another preferred f 6rm, tne nucleic acid molecules are 
derived from a combination of genes encoding transferases 
30 and/or enzymes for the transport or processing of a 

polysaccharide or oligosaccharide unit including wzx or 
wzy genes, together with nucleic acid molecules derived 
from pathway genes. 

In a second aspect the present invention relates to 
35 the identif ica%£o£ ;s oi nWcWic acid molecules which are 
useful for the detection of bacteria which express O 
antigens and for the identification of the O antigens of 
those bacteria in diagnostic assays . 
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The invention provides a nucleic acid molecule 
derived from: a gene encoding a transferase; or a gene 
encoding an enzyme for the transport or processing of a 
polysaccharide or oligosaccharide unit such as a wzx or 
5 wzy gene, the gene being involved in the synthesis of a 
particular bacterial O antigen, wherein the sequence of 
the nucleic acid molecule is specific to the particular 
bacterial 0 antigen. 

The nucleic acids of the invention may be variable in 
10 length. In one embodiment they are from about 10 to about 
20 nucleotides in length. 

In one preferred embodiment, the invention provides a 
nucleic acid molecule derived from: a gene encoding a 
transferase; or a gene encoding an enzyme for the 
15 transport or processing of a polysaccharide or 

oligosaccharide unit including a wzx or wzy gene the gene 
being involved in the synthesis of an O antigen expressed 
by E-_ coli , wherein the sequence of the nucleic acid 
molecule is specific to the O antigen. 
20 In one more preferred embodiment, the sequence of the 

nucleic acid molecule is specific to the nucleotide 
sequence encoding the 0111 antigen (SEQ ID N0:1) . More 
preferably, the sequence is derived from a gene selected 
from the group consisting of wbdH (nucleotide position 739 
25 to 1932 of SEQ ID NO:l), wzx (nucleotide position 8646 to 
9911 of SEQ ID NO:l), wzy (nucleotide position 9901 to 
10953 of SEQ ID N0:1), wbdM (nucleotide position 11821 to 
12945 of SEQ ID N0:1) and fragments of those molecules of 
at least 10-12 nucleotides in length. Particularly 
3 0 preferred nucleic acid molecules are those set out in 

Table 5 and 5A, with respect to the above mentioned genes . 

In another more preferred embodiment, the sequence of 
the nucleic acid molecule is specific to the nucleotide 
sequence encoding the 0157 antigen (SEQ ID NO: 2) . More 
35 preferably the sequence is derived from a gene selected 

from the group consisting of wbdN (nucleotide position 79 
to 861 of SEQ ID NO: 2), wbdO, (nucleotide position 2011 to 
2757 cf SEQ ID N0:2), wbdP (nucleotide position 5257 to 
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6471 of SEQ ID NO:2)), wbdR (13156 to 13821 of SEQ ID 
NO:2), wzx- (nucleotide position 2744 to 4135 of" SEQ ID 
NO: 2) and wzy (nucleotide position 858 to 2042 of SEQ ID 
NO: 2) . Particularly preferred nucleic acid molecules are 
those set out in Table 6 and 6A. 

The invention also provides in a further preferred 
embodiment a nucleic acid molecule derived from: a gene 
encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
oligosaccharide unit including a wzx or wzy gene; the gene 
being involved in the synthesis of an O antigen expressed 
by Salmonella enterica , wherein the sequence of the 
nucleic acid molecule is specific to the O antigen. 

In one more preferred form of this embodiment, the 
sequence of the nucleic acid molecule is specific to the 
nucleotide sequence encoding the enterica C2 antigen 
(SEQ ID NO: 3) . More preferably the sequence of the 
nucleic acid molecule is derived from a gene selected from 
the group consisting of wbaR (nucleotide position 2352 to 
0 3314 of SEQ ID NO:3), wbaL (nucleotide position 3361 to 
3875 of SEQ ID NO:3), wbaQ (nucleotide position 3977 to 
5020 of SEQ ID NO:3), wbaW (nucleotide position 6313 to 
7323 of SEQ ID NO:3), wbaZ (nucleotide position 7310 to 
8467 of SEQ ID NO:3), wzx (nucleotide position 1019 to 
5 2359 of SEQ ID NO: 3) and wzy (nucleotide position 5114 to 

6313 of SEQ ID NO: 3) . Particularly preferred nucleic acid 
molecules are those set out in Table 7 . 

In another more preferred form of this embodiment, 
the sequence of the nucleic acid molecule is specific to 
0 the nucleotide sequence encoding the EL. enterica B antigen 
(SEQ ID NO: 4) . More preferably the sequence is derived 
from wzx (nucleotide position 12762 to 14054 of SEQ ID 
NO: 4) or wbaV (nucleotide position 14059 to 15060 of SEQ 
ID NO: 4) . Particularly preferred nucleic acid molecules 
5 are those set out in Table 8 which are derived from wzx 
and wbaV genes . 

In a further more preferred- form of this embodiment, 
the sequence of the nucleic acid molecule is specific to 
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the enter ica D3 O antigen and is derived from the wzy 
gene - 

In yet a further preferred form of this embodiment, 
the sequence of the nucleic acid molecule is specific to 
5 the £L_ enter ica El O antigen and is derived from the wzx 
gene. 

While transferase genes, or genes coding for the 
transport or processing of a polysaccharide or 
oligosaccharide unit, such as a wzx or wzy gene, are 
10 superior targets for specific detection of individual O 

antigen types there may well be individual genes or parts 
of them within this group that can be demonstrated to be 
the same or closely related between different O antigen 
types such that crqss^reactipns can occur. Cross 
15 reactions should be avoided by the selection of a 

different target within. the .group or the use of multiple 
targets within i ., ; ^b.e />( g^ou^, ;i ., 

Further, i£ t ,is ^cognised that there are cases where 
O antigen gene flusters have arisen from recombination of 
20 at least two strains such that the unique O antigen type 
is provided by a combination of gene products shared with 
at least two other O antigen types . The recognised 
example of this phenomenon, £,s the SL. enterica O antigen 
serotype D2 which f has 5 genes, from Dl and El but none unique 
25 to D2 . In these circumstances the detection of the O 

antigen type can still be achieved in accordance with the 
invention, but requires the use of a combination of 
nucleic acid molecules to detect a specific combination of 
genes that exists only in that particular O antigen gene 
30 cluster. . . r 

Thus, the invention also,, provides a panel of nucleic 
acid molecules wherein the nucleic acid molecules are 
derived from genes encoding ; transf erases and/or enzymes 
for the transport or processing of a polysaccharide or 
35 oligosaccharidte,;]^Lt; ia^-V^JOSJ. wzx or wzy genes, wherein 
the panel of nucleic acid molecules is specific to a 
bacterial 0 antigen. Preferably the particular bacterial 
O antigen is expressed by enterica . More preferably. 
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the panel of nucleic acid molecules is specific to the D2 
O antigen and is derived from the El wzy gene and the Dl 
wzx gene. 

The combinations of nucleotide sequences may include 
nucleotide sequences derived from pathway genes, together 
with nucleotide sequences derived from transferase, wzx or 
wzy genes. 

Thus, the invention also provides a panel of nucleic 
acid molecules, wherein the nucleic acid molecules are 
derived from genes encoding transferases and/or enzymes 
for the transport or processing of a polysaccharide or 
oligosaccharide unit including wzx or wzy genes, and sugar 
pathway genes, wherein the panel of nucleic acid molecules 
is specific to a particular bacterial O antigen. 
Preferably the O antigen is expressed enterics . 

Further it is recognised that there may be instances 
where spurious hybridisation will arise through initial 
selection of a sequence found in many different genes but 
this is typically recognisable by, for instance, 
comparison of band sizes against controls in PCR gels, and 
an alternative sequence can be selected. 

The present inventors believe that based on the 
teachings of the present invention and available 
information concerning polysaccharide antigen gene 
clusters (including O antigen gene clusters) , and through 
use of experimental analysis, comparison of nucleic acid 
sequences or predicted protein structures, nucleic acid 
molecules in accordance with the invention can be readily 
derived for any particular polysaccharide antigen of 
interest. Suitable bacterial strains can typically be 
acquired commercially from depositary institutions. 

As mentioned above there are currently 166 defined 
SOIL O antigens while the enterica has 46 known O 
antigen types [Popoff M.Y. efc al (1992) "Antigenic 
formulas of the Salmonella serovars" 6th revision WHO 
Collaborating centre for Reference and Research on 
Salmonella, Institut Pasteur Paris France] . Many other 
genera of bacteria are known to have O antigens and these 
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include Citrobacter , Shigella , Yersinia , Plesiomonas . 
Vibrio and- Proteus . 

Samples of the 166 different E. coli O antigen 
serotypes are available from Statens Serum Institut, 
5 Copenhagen, Denmark. 

The 46 S^. enterica serotypes are available from 
Institute of Medical and Veterinary Science, Adelaide, 
Australia. 

In another aspect, the invention relates to a method 
10 of testing a sample for the presence of one or more 

bacterial polysaccharide antigens comprising contacting 
the sample with at least one oligonucleotide molecule 
capable of specifically hybridising to: (i) a gene 
encoding a transferase, or (ii) a gene encoding an enzyme 
15 for transport or processing of oligosaccharide or 

polysaccharide units, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the bacterial 
polysaccharide antigen; under conditions suitable to 
permit the at least one oligonucleotide molecule to 
20 specifically hybridise to at least one such gene of any 
bacteria expressing the particular bacterial 
polysaccharide antigen present in the sample and detecting 
any specifically hybridised oligonucleotide molecules. 

Where a single specific oligonucleotide molecule is 
25 unavailable a combination of molecules hybridising 

specifically to the target region may be used. Thus the 
invention provides a panel of nucleic acid molecules for 
use in the method of testing of the invention, wherein the 
nucleic acid molecules are derived from genes encoding 
30 transferases and/or enzymes for the transport or 

processing of a polysaccharide or oligosaccharide unit 
including wzx or wzy genes, wherein the panel of nucleic 
acid molecules is specific to a particular bacterial 
polysaccharide. The panel of nucleic acid molecules can 
35 include nucleic acid molecules derived from sugar pathway 
genes where necessary. 

In another aspect, the invention relates to a method 
of testing a sample for the presence of one or more 
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bacterial polysaccharide antigens comprising contacting 
the sample -with at least one pair of oligonucleotide 
molecules, with at least one oligonucleotide molecule of 
the pair capable of specifically hybridising to: (i) a 
gene encoding a transferase, or (ii) a gene encoding an 
enzyme for transport or processing oligosaccharide or 
polysaccharide units, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the bacterial 
polysaccharide antigen; under conditions suitable to 
permit the. at least one oligonucleotide molecule of the 
pair of molecules to specifically hybridise to at least 
one such gene of any bacteria expressing the particular 
bacterial polysaccharide antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 
molecules . 

The pair of oligonucleotide molecules may both 
hybridise to the same gene or to different genes. Only 
one oligonucleotide molecule of the pair need hybridise 
specifically to sequence specific for the particular 
antigen type. The other molecule can hybridise to a non- 
specific region. 

Where the particular polysaccharide antigen gene 
cluster has arisen through recombination, the at least one 
pair of oligonucleotide molecules may be selected to be 
capable of hybridising to a specific combination of genes 
in the cluster specific to that polysaccharide antigen, or 
multiple pairs may be selected to provide hybridisation to 
the specific combination of genes. Even where all the 
genes in a particular cluster are unique, the method may 
be carried out using nucleotide molecules which recognise 
a combination of genes within the cluster. 

Thus the invention provides a panel containing pairs 
of nucleic acid molecules for use in the method of testing 
of the invention, wherein the pairs of nucleic acid 
molecules are derived from genes encoding transferases 
and/or enzymes for the transport or processing of a 
polysaccharide or oligosaccharide unit including wzx or 
wzy genes, wherein the panel of nucleic acid molecules is 
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specific to a particular bacterial polysaccharide antigen. 
The panel of nucleic acid molecules can include pairs of 
nucleic acid molecules derived from sugar pathway genes 
where necessary. 

In another aspect, the invention relates to a method 
of testing a sample for the presence of one or more 
particular bacterial 0 antigens comprising contacting the 
sample with at least one oligonucleotide molecule capable 
of specifically hybridising to: (i) a gene encoding an O 
antigen transferase, or (ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
polysaccharide unit, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the particular O 
antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
at least one such gene of any bacteria expressing the 
particular bacterial O antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 
molecules. Preferably the bacteria are coli or 
0 enterica . More preferably, the IL_ coli express the 0157 
serotype or the 0111 serotype. More preferably the S . 
enterica express the C2 or B serotype. Preferably, the 
method is a Southern blot method. More preferably, the 
nucleic acid molecule is labelled and hybridisation of the 
5 nucleic acid molecule is detected by autoradiography or 
detection of fluorescence. 

The inventors envisage circumstances where a single 
specific oligonucleotide molecule is unavailable. In 
these circumstances a combination of molecules hybridising 
0 specifically to the target region may be used. Thus the 
invention provides a panel of nucleic acid molecules for 
use in the method of testing of the invention, wherein the 
nucleic acid molecules are derived from genes encoding 
transferases and/or enzymes for the transport or 
5 processing of a polysaccharide or oligosaccharide unit 
including wzx or wzy genes, wherein the panel of nucleic 
acid molecules is specific to a particular bacterial O 
antigen. Preferably the particular bacterial O antigen is 
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expressed by S^. enterica. The panel of nucleic acid 
molecules can include nucleic acid molecules derived from 
sugar pathway genes where necessary. 

In another aspect, the invention relates to a method 
of testing a sample for the presence of one or more- 
particular bacterial 0 antigens comprising contacting the 
sample with at least one pair of oligonucleotide molecules 
with at least one oligonucleotide molecule of the pair 
being capable of specifically hybridising to: (i) a gene 
encoding an O antigen transferase, or (ii) a gene encoding 
an enzyme for transport or processing of the 
oligosaccharide or polysaccharide unit, including a wzx or 
wzy gene; wherein said gene is involved in the synthesis 
of the particular O antigen; under conditions suitable to 
permit the at least one oligonucleotide molecule to 
specifically hybridise to at least one such gene of any 
bacteria expressing the particular bacterial O antigen 
present in the sample and detecting any specifically 
hybridised oligonucleotide molecules . 

Preferably the bacteria are coli or JL_ enterica . 
More preferably, the coli are of the 0111 or the 0157 
serotype. More preferably the S^. enterica express the C2 
or B serotype. Preferably, the method is a polymerase 
chain reaction method. More preferably the oligonucleotide 
molecules for use in the method of the invention are 
labelled. Even more preferably the hybridised 
oligonucleotide molecules are detected by electrophoresis. 
Preferred oligonucleotides for use with 0111 which provide 
for specific detection of 0111 are illustrated in Table 5 
and 5A with respect to the genes wbdH, wzx, wzy and wbdM. 
Preferred oligonucleotide molecules for use with 0157 
which provide for specific detection of 0157 are 
illustrated in Table 6 and 6A. 

With respect to serotypes C2 and B, suitable 
oligonucleotide molecules can be selected from appropriate 
regions described in column 3 of Tables 7 and 8 . 

The inventors envisage rare circumstances whereby two 
genetically similar gene clusters encoding serologically 
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different O antigens have arisen through recombination of 
genes or mutation so as to generate polymorphic variants . 
In these circumstances multiple pairs of oligonucleotides 
may be selected to provide hybridisation to the specific 
5 combination of genes . The invention thus provides a panel 
containing pairs of nucleic acid molecules for use in the 
method of testing of the invention, wherein the pairs of 
nucleic acid molecules are derived from genes encoding 
transferases and/or enzymes for the transport or 
10 processing of a polysaccharide or oligosaccharide unit 

including wzx or wzy genes, wherein the panel of nucleic 
acid molecules is specific to a particular bacterial 0 
antigen. Preferably the particular bacterial 0 antigen is 
expressed by £L_ enterica . The panel of. nucleic acid 
15 molecules can include pairs of nucleic acid molecules 
derived from sugar pathway genes where necessary. 

In another aspect, the invention relates to a method 
for testing a food derived sample for the presence of one 
or more particular bacterial O antigens comprising 
20 contacting the sample with at least one pair of 

oligonucleotide molecules with at least one oligonucleotide 
molecule of the pair being capable of specifically 
hybridising to: (i) a gene encoding an O antigen 
transferase, or (ii) a gene encoding an enzyme for 
25 transport or processing of the oligosaccharide or 

polysaccharide unit, including a wzx or wzy gene; wherein 
the gene is involved in the synthesis of the particular O 
antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
30 at least one such gene of any bacteria expressing the 

particular bacterial polysaccharide antigen present in the 
sample and detecting any specifically hybridised 
oligonucleotide molecules. Preferably the bacteria are E. 
coli or iL_ enterica . More preferably, the EL. coli are of 
35 the 0111 or 0157 serotype. More preferably the £^ 

enterica are of the C2 or B serotype. Preferably, the 
method is a polymerase chain reaction method. More 
preferably the oligonucleotide molecules for use in the 
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method of the invention are labelled. Even more 
preferably the hybridised oligonucleotide molecules are 
detected by electrophoresis . 

In another aspect the present invention relates to a 
5 method for testing a faecal derived sample for the presence 
of one or more particular bacterial O antigens comprising 
contacting the sample with at: least one pair of 
oligonucleotide molecules With at least one oligonucleotide 
molecule of the pair being capable of specifically 

10 hybridising to: (i) a gene encoding an O antigen 

transferase, or <ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
polysaccharide unit, including a wzx or wzy gene; wherein 
said gene is involved in tne synthesis of the particular O 

15 antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
at least one of said ^§ti^M any bacteria expressing the 
particular bacterial b antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 

20 molecules. Preferably the bacteria are L coli or S. 

enterica . More preferably, the K coli are of the 0111 or 
0157 serotype. More preferably, the enterica are of 
the C2 or B serotype. Preferably, the method is a 
polymerase chain reaction 1 ' m^t^od. More preferably the 

25 oligonucleotide molecules for use in the method of the 
invention are labelled. Even more preferably the 
hybridised oligonucleotide molecules are detected by 
electrophoresis . , r 

In another aspect, the present invention relates to a 

30 method for testing a sample derived from a patient for the 
presence of one or more particular bacterial O antigens 
comprising contacting the saranple with at least one pair of 
oligonucleotide moledUfls ^t.h at least one oligonucleotide 
molecule of the pair 1 ^^|f capable of specifically 

35 hybridising to: (i) a genie encoding an O antigen 

transferase, or (ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
polysaccharide unit, including a wzx or wzy gene; wherein 
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said gene is involved in the synthesis of the particular O 
antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
at least one such gene of any bacteria expressing the 
5 particular bacterial O antigen present in the sample and 
detecting any specifically hybridised oligonucleotide 
molecules . Preferably the bacteria are L. coli or S. 
enterica . More preferably, the EL. coli are of the 0111 or 
0157 serotype. More preferably, the enterica are of 
10 the C2 or B serotype. Preferably, the method is a 

polymerase chain reaction method. More preferably the 
oligonucleotide molecules for use in the method of the 
invention are labelled. Even more preferably the 
hybridised oligonucleotide molecules are detected by 
15 electrophoresis. 

In the above described methods it will be understood 
that where pairs of oligonucleotides are used one of the 
oligonucleotide sequences may hybridise to a sequence that 
is not from a transferase, wzx or wzy gene. Further where 
20 both hybridise to one of these gene products they may 

hybridise to the same or a different one of these genes. 

In addition it will be understood that where cross 
reactivity is an issue a combination of oligonucleotides 
may be chosen to detect a combination of genes to provide 
25 specificity. 

The invention further relates to a diagnostic kit 
which can be used for the detection of bacteria which 
express bacterial polysaccharide antigens and the 
identification of the bacterial polysaccharide type of 
30 those bacteria. 

Thus in a further aspect, the invention relates to a 
kit comprising a first vial containing a first nucleic 
acid molecule capable of specifically hybridising to: (i) 
a gene encoding a transferase, or (ii) a gene encoding an 
35 enzyme for transport or processing oligosaccharide or 

polysaccharide, including a wzx or wzy gene, wherein the 
said gene is involved in the synthesis of a bacterial 
polysaccharide. The kit may also provide in the same or a 
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separate vial a second specific nucleic acid capable of 
specifically hybridising to: (i) a gene encoding a 
transferase, or (ii) a gene encoding an enzyme for 
transport or processing oligosaccharide or polysaccharide, 
including a wzx or wzy gene, wherein the said gene is 
involved in the synthesis of a bacterial polysaccharide, 
wherein the sequence of the second nucleic acid molecule 
is different from the sequence of the first nucleic acid 
molecule . 

In a further aspect the invention relates to a kit 
comprising a first vial containing a first nucleic acid 
molecule capable of specifically hybridising to: (i) a 
gene encoding a transferase, or (ii) a gene encoding an 
enzyme for transport or processing oligosaccharide or 
polysaccharide including wzx or wzy, wherein the said gene 
is involved in the synthesis of a bacterial O antigen. 
The kit may also provide in the same or a separate vial a 
second specific nucleic acid capable of specifically 
hybridising to: (i) a gene encoding a transferase ., or 
(ii) a gene encoding an enzyme for transport or processing 
oligosaccharide or polysaccharide including wzx or wzy, 
wherein the said gene is involved in the synthesis of O 
antigen, wherein the sequence of the second nucleic acid 
molecule is different from the sequence of the first 
nucleic acid molecule. Preferably the first and second 
nucleic acid sequences are derived from coli or the 
first and second nucleic acid sequences are derived from 
enterica . 

The present inventors provide full length sequence of 
the 0157 gene cluster for the first time and recognise 
that from this sequence of this previously uncloned full 
gene cluster appropriate recombinant molecules can be 
generated and inserted for expression to provide expressed 
0157 antigens useful in applications such as vaccines. 

DEFINITIONS 

The phrase, "a nucleic acid molecule derived from a 
gene" means that the nucleic acid molecule has a 
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nucleotide sequence which, is either identical or 
substantially similar to "all or part of the identified 
gene. Thus a nucleic acid molecule derived from a gene 
can be a molecule which is isolated from the identified 
gene by physical separation from that gene, or a molecule 
which is artificially, synthesised and has a nucleotide 
sequence which is either identical to or substantially 
similar to all or part of the identified gene. While some 
workers consider only the DNA strand with the same 
sequence as the mRNA transcribed from the gene, here 
either strand is intended. 

Transferase genes are regions of nucleic acid which 
have a nucleotide sequence which encodes gene products 
that transfer monomeric sugar units . 

Flippase or wk* genes are regions of nucleic acid 
which have a nucleotide sequence which encodes a gene 
product that flips oligosaccharide repeat units generally 
composed of three ^'%©-^lS^ v '^|noraeric sugar units to the 
external surface of the membrane . 

Polymerase or wzy ; "genes are regions of nucleic acid 
which have a nucleotide sequence which encodes gene 
products that polymerise repeating oligosaccharide units 
generally composed of 3-6 monomeric sugar units. 

The nucleotide sequences provided in this 
specification are -'3$ ^rii>ed : in ''the sequence listing as 
anti-sense sequences . VThis term is used in the same 
manner as it is used iii Glossary of Biochemistry and 
Molecular Biology Revised Edition, David M. Glick, 1997 
Portland Press Ltd., London on page 11 where the term is 
described as referring to bne of the two strands of 
double-stranded DNA usually that which has the same 
sequence as the mRNA. We use it to describe this strand 
which has the same > sequence as the mRNA. 
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NOMENCLATURE 

Synonyms for EL_ coli Olll rfb 
Current names Our names Bastin et al . 1991 

wbdH orfl 
gmd orf2 

wbdl orf3 orf3.4* 

manC orf4 rfbM* 

manB orf 5 rfbK* 

wbdJ orf 6 orf 6. 7* 

wbdK orf 7 orf 7. 7* 

wzx orf 8 orf 8. 9 and rfbX* 

wzy orf9 

wbdL orf 10 

wbdM orfll 

* Nomenclature according to Bastin D.A. , et al. 1991 "Molecular 
cloning and expression in Escherichia coli K-12 of the rfb gene 
cluster determining the O antigen of an K coli Olll strain". Mol . 
Microbiol. 5:9 2223-2231. 



20 Other Synonyms 



wzy rfc 

wzx rfbX 

rmlA rfbA 

rmlB rfbB 

25 rmlC rfbC 

rmlD rfbD 

glf orf6* 

wbbl orf3#, orf 8* of coli K-12 

wbbJ orf2#, orf 9* of E^. coli K-12 

3 0 wbbK orfl#, orf 10* of E_i_ coli K-12 

wbbL orf5#, orf 11* of L coli K-12 
# Nomenclature according to Yao, Z. And M. A. Valvano 1994. 



"Genetic analysis of the O-specific lipopolysaccharide biosynthesis 
region (rfb) of Eschericia coli K-12 W3110: identification of genes 
the confer groups -specificty to Shigella flexineri serotypes Y and 
4a". J. Bacterid. 176: 4133-4143. 

• Nomenclature according to Stevenson et al. 1994. "Structure of 
the 0-antigen of E. coli K-12 and the sequence of its rfb gene 
cluster". J. Bacterid 176: 4144-4156. 

• S. enterica is a name introduced in 1987 to replace the many other 
names such as Salmonella typhi and Salmonella tvphimurium . the old 
species names becoming serovar names as in S_j. enterica sv Typhi. 
However, the traditional names are still widely used. 

• The 0 antigen genes of many species were given rfb names ( rfbA etc) 
and the 0 antigen gene cluster was often referred to as the rfb 
cluster. There are now new names for the rfb genes as shown in the 
table. Both terminologies have been used herein, depending on the 
source of the information. 
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• BRIEF DESCRIPTION OF DRAWINGS 

Figure 1 shows Eco R-l restriction maps of cosmid 
clones pPRl054, pPR1055, pPR1056, pPR1058, pPRl287 which 
are subclones of EL coli 0111 O antigen gene cluster. The 
thickened line is the region common to all clones .. Broken 
lines show segments that are non-contiguous on the 
chromosome. The deduced restriction map for EL. coli 
strain M92 is shown above. 

Figure 2 shows a restriction mapping analysis of E. 
coli 0111 0 antigen gene cluster within the cosmid clone 
PPR1058. Restriction enzymes are: (B: BamHl; Bg : Bglll, 
E: EcoRl; H: HindTII ; K: Kpnl ; P: Ps tl ; S: Sal J and X: 
Xhol. Plasmids pPR1230, pPR1231, and pPR!288 are deletion 
derivatives of pPR1058. Plasmids pPR 1237, pPR1238, 
PPR1239 and pPR1240 are in pUC19 . Plasmids pPR1243 , 
PPR1244, pPRl245, pPR1246 and pPR1248 are in pUC18, and 
PPR1292 is in pUC19 . Plasmid pPR1270 is in pT7T319U. 
Probes 1, 2 and 3 were isolated as internal fragments of 
PPR1246, pPR1243 and pPRl237 respectively. Dotted lines 
0 indicate that subclone DNA extends to the left of the map 
into attached vector. 

Figure 3 shows the structure of L coli 0111 0 
antigen gene cluster. 



5 antigen gene cluster. 

Figure 5 shows the structure enterica locus 
encoding the serogroup C2 O antigen gene cluster. 

Figure 6 shows the structure SL. enterica locus 
encoding the serogroup B 0 antigen gene cluster. 
0 Figure 7 shows the nucleotide sequence of the EL. coli 

0111 0 antigen gene cluster. Note: <1> The first and last 
three bases of a gene are underlined and of italic 
respectively.; (2) The region which was previously 
sequenced by Bastin and Reeves 1995 ^Sequence and anlysis 
5 of the O antigen gene (rfb) cluster of Eschericia coli 
olll" Gene 164: 17-23 is marked. 

Figure 8 shows the nucleotide sequence of the EL. coli 
0157 0 antigen gene cluster. Note: (1) The first and last 



Figure 4 shows the structure of EL. coli 0157 0 
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three bases of a gene (region) are underlined and of italic 
respectively (2) The region previously sequenced by Bilge 
et al. 1996 "Role of the Eschericia coli 0157-H7 0 side 
chain in adherence and analysis of an rfb locus". Inf. and 
Immun 64:4795-4801 is marked. 

Figure 9 shows the nucleotide sequence of enterica 
serogroup C2 0 antigen gene cluster. Note: 
(1) The numbering is as in Brown et al. 1992. "Molecular 
analysis of the rfb gene cluster of Salmonella, serovar 
muenchen (strain M67) : the genetic basis of the 
polymorphism between groups C2 and B" . Mol. Microbiol. 6: 
1385-1394(2) The first and last three bases of a gene are 
underlined and in italics respectively. (3) Only that part 
of the group C2 gene cluster, which differs from that of 
group B, was sequenced and is presented here. 

Figure 10 shows the nucleotide sequence of enterica 
serogroup B O antigen gene cluster Note: (1) The numbering is as 
in Jiang et al. 1991. "Structure and sequence of the rfb (O 
antigen) gene cluster of Salmonella serovar typhimurium (strain 
LT2 ) " . Mol. Microbiol. 5: 695-713. The first gene in the O 
antigen gene cluster is rmlB which starts at base 4099. (2) The 
first and last three bases of a gene are underlined and in 
italics respectively. 

BEST METHOD FOR CARRYING OUT THE INVENTION 

Materials and Methods -part 1 

The experimental procedures for the isolation and 
characterisation of the L coli Olll O antigen gene 
cluster (position 3,021-9,981) are according to Bastin 
D.A., et al. 1991 "Molecular cloning and expression in 
Escherichia coli K-12 of the rfb gene cluster determining 
the O antigen of an coli Olll strain". Mol. Microbiol. 
5:9 2223-2231 and Bastin D. A. and Reeves, P.R. 1995 
"Sequence and analysis of the O antigen gene (rfb) cluster 
of Escherichia coli Olll". Gene 164: 17-23. 
A. Bacterial strains and growth media 

Bacteria were grown in Luria broth supplemented as 
required . 
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B. Cosmids and phage 

Cosmids in the host -strain x2819 were repackaged in 
vivo. Cells were grown in 250mL flasks containing 30mL of 
culture, with moderate shaking at 30°C to an optical 
5 density of 0.3 at 580 nm. The defective lambda prophage 
was induced by heating in a water bath at 45°C for 15min 
followed by an incubation at 37°C with vigorous shaking 
for 2hr. Cells were then lysed by the addition of 0 . 3mL 
chloroform and shaking for a further lOmin. Cell debris 
10 were removed from lmL of lysate by a 5min spin in a 

microcentrifuge, and the supernatant removed to a fresh 
microf uge tube . One drop of chloroform was added then 
shaken vigorously through the tube contents . 

C. DNA preparation 

15 Chromosomal DNA was prepared from bacteria grown 

overnight at 37°C in a volume of 30mL of Luria broth. 
After harvesting by centrif ugation, cells were washed and 
resuspended in lOmL of 50mMTris-HCl pH 8.0. EDTA was 
added and the mixture incubated for 20min. Then lysozyme 
20 was added and incubation continued for a further lOmin. 
Proteinase K, SDS, and ribonuclease were then added and 
the mixture incubated for up to 2hr for lysis to occur. 
All incubations were at 37°C. The mixture was then heated 
to 65°C and extracted once with 8mL of phenol at the same 
25 temperature. The mixture was extracted once with 5mL of 
phenol /chloroform/ iso-amyl alcohol at 4°C. Residual 
phenol was removed by two ether extractions. DNA was 
precipitated with 2 vols, of ethanol at 4°C, spooled and 
washed in 70% ethanol, resuspended in l-2ntL of TE and 

30 dialysed. Plasmid and cosmid DNA was prepared by a 

modification of the Birnboim and Doly method [Birnboim, h. 
C. And Doly, J. (1979) A rapid alkaline extraction 
procedure for screening recombinant plasmid DNA Nucl. Acid 
Res. 7:1513-1523. The volume of culture was lOmL and the 

35 lysate was extracted with phenol /chloroform/ iso-amyl 

alcohol before precipitation with isopropanol . Plasmid 
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DNA to be used as vector was isolated on a continuous 
caesium chioride gradient- following alkaline lysis of 
cells grown in 1L of culture. 

D. Enzymes and buffers. 

Restriction endonucleases and DNA T4 ligase were 
purchased from Boehringer Mannheim (Castle Hill, NSW, 
Australia) or Pharmacia LKB (Melbourne, VIC Australia). 
Restriction enzymes were used in the recommended 
commercial buffer. 

E. Construction of a gene bank. 

Individual aliquots of M92 chromosomal DNA (strain 
Stoke W, from Statens Serum Institut, 5 Artillerivej , 2300 
Copenhagen S, Denmark) were partially digested with 0.2U 
Sau3Al for l-15mins. Aliquots giving the greatest 
proportion of fragments in the size range of approximately 
40-50kb were selected and ligated to vector pPR691 
previously digested with BamEl and PvuII. Ligation 
mixtures were packaged in vitjro with packaging extract. 
The host strain for transduction was x2819 and 
recombinants were selected with kanamycin. 

F . Serological procedures . 

Colonies were screened for the presence of the Olll 
antigen by immunoblotting . Colonies were grown overnight, 
up to 100 per plate then transferred to nitrocellulose 
discs and lysed with 0.5N HC1. Tween 2 0 was added to TBS 
at 0.05% final concentration for blocking, incubating and 
washing steps. Primary antibody was E^. coli 0 group 111 
antiserum, diluted 1:800. The secondary antibody was goat 
anti-rabbit IgG labelled with horseradish peroxidase 
diluted 1:5000. The staining substrate was 4-chloro-l- 
napthol. Slide agglutination was performed according to 
the standard procedure. 

G. Recombinant DNA methods. 

Restriction mapping was based on a combination of 
standard methods including single and double digests and 
sub-cloning. Deletion derivatives of entire cosmids were 
produced as follows: aliquots of 1 . 8ng of cosmid DNA were 
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digested in a volume of 2 0ul with 0.25U of restriction 
enzyme for~5-80min. One half of each aliquot was used to 
check the degree of digestion on an agarose gel. The 
sample which appeared to give a representative range of 
fragments was ligated at 4°C overnight and transformed by 
the CaCl 2 method into JM109. Selected plasmids were 
transformed into s$174 by the same method. P4657 was 
transformed with pPR1244 by electroporation . 

H. DNA hybridisation 

Probe DNA was extracted from agarose gels by 
electroelution and was nick-translated using [a-32P] -dCTP . 
Chromosomal or plasmid DNA was electrophoresed in 0.8% 
agarose and transferred to a nitrocellulose membrane. The 
hybridisation and pre-hybridisation buffers contained 
either 30% or 50% formamide for low and high stringency 
probing respectively. Incubation temperatures were 42°C 
and 37°C for pre-hybridisation and hybridisation 
respectively. Low stringency washing of filters consisted 
of 3 x 20min washes in 2 x SSC and 0.1% SDS. High- 
stringency washing consisted of 3 x 5min washes in 2 x SSC 
and 0.1% SDS at room temperature, a lhr wash in 1 x SSC 
and 0.1% SDS at 58°C and 15min wash in 0.1 x SSC and 0.1% 
SDS at 58°C. 

I. Nucleotide sequencing of EL-. coli 0111 O antigen gene 
cluster (position 3,021-9,981) 

Nucleotide sequencing was performed using an ABI 373 
automated sequencer (CA, USA) . The region between map 
positions 3.30 and 7.90 was sequenced using 
uni-directional exonuclease III digestion of deletion 
families made in PT7T3190 from clones pPR!270 and pPR1272. 
Gaps were filled largely by cloning of selected fragments 
into M13rapl8 or M13mpl9 . The region from map positions 
7.90-10.2 was sequenced from restriction fragments in 
M13mpl8 or M13mpl9 . Remaining gaps in both the regions 
were filled by priming from synthetic oligonucleotides 
complementary to determined positions along the sequence, 
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using a single stranded DNA template in Ml 3 or phagemid. 
The oligonucleotides were designed after analysing the 
adjacent sequence. All sequencing was performed by the 
chain termination method. Sequences were aligned using SAP 
[Staden, R. , 1982 ^Automation of the computer handling of 
gel reading data produced by the shotgun method of DNA 
sequencing". Nuc. Acid R&s. 10: 4731-4751; Staden, R. , 
1986 w The current status and portability of our sequence 
handling software". Nuc. Acid Res. 14: 217-231], The 
program NIP [Staden, R. 1982 "An interactive graphics 
program for comparing and aligning nucleic acid and amino 
acid sequence". Nuc. Acid Res. 10: 2951-2961] was used to 
find open reading frames and translate them into proteins. 
J - Isolation of clones, carrying coli 0111 0 antigen 
gene cluster 

The coli 0 antigen gene cluster was isolated 
according to the method^of BfcJstin D.A. , et al. [1991 
"Molecular cloning and expression in Escherichia coli K-12 
of the rfb gene cluster determining the O antigen of an E. 
coli Olll strain". Mol. Microbiol. 5(9), 2223-2231]. 
Cosmid gene banks of M92 chromosomal DNA were established 
in the in vivo packaging strain x2819. From the genomic 
bank, 3.3 x 10 3 colonies were screened with JL_ coli 0111 
antiserum using an iiretouno-blottihg procedure: 5 colonies 
(PPR1054, pPRlG55, pPRi056, pPR1058 and pPRl287) were 
positive. The cosmids from 'these strains were packaged in 
vivo into lambda particles and transduced into the L coli 
deletion mutant S<|)174 which lacks all O antigen genes. In 
this host strain, all plasmids gave positive agglutination 
with 0111 antiserum. An Eco Rl restriction map of the 5 
independent cosmids showed that they have a region of 
approximately 11.5 kb in common (Figure 1) . Cosmid 
PPR1058 included sufficient flanking DNA to identify 
several chromosomal, niarkers linked to O antigen gene 
cluster and was selected for analysis of the O antigen 
gene cluster region. 

K. Restriction mapping of cosmid pPR1058 
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Cosmid pPR1058 was mapped in two stages. A 
preliminary map was constxucted first, and then the region 
between map positions 0.00 and 23.10 was mapped in detail, 
since it was shown to be sufficient for 0111 antigen 
5 expression. Restriction sites for both stages are shown 
in Figure 2 . The region common to the five cosmid clones 
was between map positions 1.35 and 12.95 of pPRl058. 

To locate the O antigen gene cluster within pPRl058, 
pPR1058 cosmid was probed with DNA probes covering O 
10 antigen gene cluster flanking regions from enterica LT2 
and E. coli K-12 . Capsular polysaccharide (cps) genes lie 
upstream of 0 antigen gene cluster while the gluconate 
dehydrogenase (gnd) gene and the histidine {his) operon 
are downstream, the latter being further from the O 
15 antigen gene cluster. The probes used were pPR472 

(3.35kb), carrying the gnd gene of LT2 , pPR685 (5.3kb) 
carrying two genes of the cps cluster, cpsB and cpsG of 
LT2, and K350 (16.5kb) carrying all of the his operon of 
K-12. Probes hybridised as follows: pPR472 hybridised to 
20 1.55kb and 3.5 kb {including 2.7 kb of vector) fragments 
of Pstl and Hindlll double digests of pPR1246 (a 
Hindi 1 1 /.EcoRl subclone derived from pPR1058, Figure 2), 
which could be located at map positions 12.95-15.1; pPR685 
hybridised to a 4.4 kb EcoRl fragment of pPR1058 
25 {including 1.3 kb of vector) located at map position 0.00- 
3.05; and K350 hybridised with a 32kb EcoRl fragment of 
PPR1058 (including 4.0kb of vector), located at map 
position 17.30-45.90. Subclones containing the presumed 
gnd region complemented a gncTedd" strain GB23152. On 
3 0 gluconate bromothymol blue plates, pPRl244 and pPRl292 in 
this host strain gave the green colonies expected of a 
gn<?edd~ genotype . The his* phenotype was restored by 
plasmid pPRl058 in the his deletion strain S<|>174 on 
minimal medium plates, showing that the plasmid carries 
35 the entire his operon. 

It is likely that the 0 antigen gene cluster region 
lies between gnd and cps, as in other E. coli and S . 
enterica strains, and hence between the approximate map 
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positions 3.05 and 12.95. To confirm this, deletion 
derivatives of pPR1058 were made as follows: first, 
PPR1058 was partially digested with HindlXX and self 
ligated. Transf ormants were selected for kanamycin 
resistance and screened for expression of Olll antigen. 
Two colonies gave a positive reaction. £coRl digestion 
showed that the two colonies hosted identical plasmids, 
one of which was designated pPR1230, with an insert which 
extended from map positions 0.00 to 23.10. Second pPR1058 
was digested with Sail and partially digested with Xhol 
and the compatible ends were re-ligated. Transf ormants 
were selected with kanamycin and screened for 0111 antigen 
expression. Plasmid DNA of 8 positively reacting clones 
was checked using EcoRl and Xhol digestion and appeared to 
be identical. The cosmid of one was designated pPR1231. 
The insert of pPR1231 contained the DNA region between map 
positions 0.00 and 15.10. Third, pPR1231 was partially 
digested with Xhol, self -ligated, and transf ormants 
selected on spectinomycin/ streptomycin plates. Clones 
were screened for kanamycin sensitivity and of 10 
selected, all had the DNA region from the Xhol site in the 
vector to the Xhol site at position 4.00 deleted. These 
clones did not express the Olll antigen, showing that the 
Xhol site at position 4.00 is within the O antigen gene 
cluster. One clone was selected and named pPR1288. 
Plasmids pPR123 0, pPR1231, and pPR1288 are shown in Figure 



2. 



L. Analysis of the E. coli Olll O antigen gene 

cluster (position 3,021-9,981) nucleotide sequence data 

Bastin and Reeves [1995 "Sequence and analysis of the 
O antigen gene (rfb) cluster of Escherichia coli Olll". Gene 
164: 17-23] partially characterised the coli Olll O 
antigen gene cluster by sequencing a fragment from map 
position 3,021-9,981. Figure 3 shows the gene 
organisation of position 3,021-9,981 of E^ coli Olll O 
antigen gene cluster. orf3 and orf6 have high level amino 
acid identity with wcaH and wcaG (46.3% and 37.2% 
respectively) , and are likely to be similar in function to 
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sugar biosynthetic pathway genes in the EL. coli K-12 
colanic gene cluster. orf 4 and orf 5 show high levels of 
amino acid homology to manC and manB genes respectively, 
orf 7 shows high level homology with rfbH which is an 
5 abequose pathway gene. orf 8 encodes a protein with 12 
transmembrane segments and, has similarity in secondary 
structure to other wzx gen^s and is likely therefore to be 
the O antigen f lippase gene 

10 Materials- and Methods-part 2 

A. Nucleotide sequencing of 1 to 3,020 and 9,982 to 
14,516 of the IL_ coli Olll O antigen gene cluster 

The sub clones wliicfh contained novel nucleotide 
sequences, pPR1231 (map position 0 and 1,510), pPR1237 

15 (map position -300 to 2,744), pPR1239 (map position 2,744 
to 4,168), pPR1245 (map position 9,736 to 12,007) and 
PPR1246 (inap pbsit^<^ > . { ^2 ?;: !ft0.7. l .to 15,300) (Figure 2), were 
characterised as follows s the distal ends of the inserts 
of pPRl237, ppRl239 and pPRl245 were sequenced using the 

20 M13 forward and reverse primers located in the vector. 

PCR walking was carried out to sequence further into each 
insert using primers based on the sequence data and the 
primers were tagged with Ml 3 forward or reverse primer 
sequences for sequencing ,. ( This PCR walking procedure was 

25 repeated until the entire insert was sequenced. pPRl246 

was characterise^ frpm position 12,007 to 14,516. The DNA 
of these sub clones was sequenced in both directions . The 
sequencing reactions were performed using the dideoxy 
termination method and thermocyc 1 ing and reaction products 

30 were analysed using fluorescent dye and an ABI automated 
sequencer (CA, USA). 

B. Analysis of the E. coli 0111 0 antigen gene cluster 
(positions 1 to 3,020 and 9,982 to 14,516 of SEQ ID NO:l) 
nucleotide sequence da£a 
35 The gene organisation of regions of 2L_ coli 0111 O 

antigen gene cluster which were not characterised by 
Bast in and Reeves [1995 "Sequence and analysis of the 0 
antigen gene (rfb) cluster of Escherichia coli Olll." Gene 
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164: 17-23] , (positions 1 to 3,020 and 9,982 to 14,516) is 
shown in Figure 3. There" are two open reading frames in 
region 1 . Four open reading frames are predicted in 
region 2 . The position of each gene is listed in Table 5 . 

The deduced amino acid sequence of orfl {wbdH) shares 
about 64% similarity with that of the rfp gene of Shigella 
dvsenteriae. Rfp and WbdH have very similar 
hydrophobicity plots and both have a very convincing 
predicted transmembrane segment in a corresponding 
position. rfp is a galactosyl transferase involved in the 
synthesis of LPS core, thus wbdH is likely to be a 
galactosyl transferase gene. orf2 has 85.7% identity at 
amino acid level to the gmd gene identified in the coli 
K-12 colanic acid gene cluster and is likely to be a gmd 
gene. orf9 encodes a protein with 10 predicted 
transmembrane segments and a large cytoplasmic loop. 
This inner membrane topology is a characteristic feature 
of all known 0 antigen polymerases thus it is likely that 
or£9 encodes an 0 antigen polymerase gene, wzy. orflO 
(wbdL) has a deduced amino acid sequence with low homology 
with lssi.2 of Neisseria, gonorrhoeae . Ls±2 is responsible 
for adding GlcNAc to galactose in the synthesis of 
lipooligosaccharide. Thus it is likely that wjbdL is 
either a colitose or glucose transferase gene. orfll 
(wbdM) shares high level nucleotide and amino acid 
similarity with TrsE of Yersinia enterocholitica . TrsE is 
a putative sugar transferase thus it is likely that wbdM 
encodes the colitose or glucose transferase. 

In summary three putative transferase genes and an 0 
antigen polymerase gene were identified at map position 1 
to 3,020 and 9,982 to 14,516 of E^ coli 0111 0 antigen 
gene cluster. A search of GenBank has shown that there 
are no genes with significant similarity at the nucleotide 
sequence level for two of the three putative transferase 
genes or the polymerase gene. SEQ ID N0:1 and Figure 7 
provide the nucleotide sequence of the Olll antigen gene 
cluster. 
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Materials and Methods -part 3 

A. PCR amplification of'Ol57 antigen gene cluster from 
an EL. coli 0157 :H7 strain (Strain C664-1992, from Statens 
Serum Institut, 5 Artillerivej , 2300, Copenhagen S, 
5 Denmark) 

E. coli 0157 0 antigen gene cluster was amplified by 
using long PCR [Cheng et al . 1994, Effective amplification 
of long targets from cloned inserts and human and genomic 
DNA" P.N.A.S. USA 91: 5695-569] with one primer (primer 

10 #412: att ggt age tgt aag cca agg gcg gta gcg t) based on 
the JumpStart sequence usually found in the promoter 
region of O antigen gene clusters [Hobbs, e t al . 1994 "The 
JumpStart sequence: a 39 bp element common to several 
polysaccharide gene dusted" Mol . Microbiol. 12: 855-856], 

15 and another primer #482 (cac tgc cat acc gac gac gec gat 
ctg ttg ctt gg) based on the gnd gene usually found 
downstream of the O antigen gene cluster. Long PCR was 
carried out using the Expand Long Template PCR System from 
Boehringer Mannheim (Castle Hill NSW Australia) , and 
0 products, 14 kb in length, from several reactions were 
combined and purified using the Promega Wizard PCR preps 
DNA purification System (Madison WI USA) . The PCR product 
was then extracted with phenol and twice with ether, 
precipitated with 70% ethanol, and resuspended in 40jlL of 

25 water. 

B. Construction of a random DNase I bank: 

Two aliquots containing about 150ng of DNA each were 
subjected to DNase I digestion using the Novagen DNase I 
Shotgun Cleavage (Madison WI USA) with a modified protocol 

30 as described. Each aliquot was diluted into 45jil of 0.05M 
Tris -HC1 (pH7.5), 0.05mg/mL BSA and lOmM MnCl 2 . 5jjiL of 
1:3000 or 1:4500 dilution of DNasel (Novagen) (Madison WI 
USA) in the same buffer was added into each tube 
respectively and 10^.1 of stop buffer (lOOmM EDTA) , 30% 

35 glycerol, 0.5% Orange G, 0.075% xylene and cyanol 

(Novagen) -(Madison WI USA) was added after incubation at 
15°C for 5 min. The DNA from the two DNasel reaction 
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tubes were then combined and fractionated on a 0.8% LMT 
agarose gel, and the gel "segment with DNA of about lkb in 
size {about 1 . 5mL agarose) was excised. DNA was extracted 
from agarose using Promega Wizard PCR Preps DNA 
5 Purification (Madison WI USA) and resuspended in 2 00 JIL 
water, before being extracted with phenol and twice with 
ether, and precipitated. The DNA was then resuspended in 
17.25 JIL water and subjected to T4 DNA polymerase repair 
and single dA tailing using the Novagen Single dA Tailing 

10 Kit (Madison WI USA) . The reaction product (85JJ.1 
containing about 8ng DNA) was then extracted with 
chloroform zisoamyl alcohol (24:1) once and ligated to 3x 
10" 3 pmol pGEM-T (Promega) (Madison WI USA) in a total 
volume of IOOjlIL. Ligation was carried out overnight at 

15 4°C and the ligated DNA was precipitated and resuspended 
in 20JIL water before being electroporated into EL-_ coli 
strain JM109 and plated out on BCIG-IPTG plates to give a 
bank . 

C . Sequencing 

20 DNA templates from clones of the bank were prepared 

for sequencing using the 9 6 -well format plasmid DNA 
miniprep kit from Advanced Genetic Technologies Corp 
(Gaithersburg MD USA) The inserts of these clones were 
sequenced from one or both ends using the standard M13 

25 sequencing primer sites located in the pGEM-T vector. 
Sequencing was carried out on an ABI377 automated 
sequencer (CA USA) as described above, after carrying out 
the sequencing reaction on an ABI Catalyst (CA USA) . 
Sequence gaps and areas of inadequate coverage were PCR 

30 amplified directly from 0157 chromosomal DNA using primers 
based on the already obtained sequencing data and 
sequenced using the standard M13 sequencing primer sites 
attached to the PCR primers. 

D. Analysis of the E_*. coli 0157 O antigen gene cluster 
3 5 nucleotide sequence data 

Sequence data were processed and analysed using the 
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Staden programs [Staden, R., 1982 "Automation of the 
computer handling of gel "reading data produced by the 
shotgun method' of DNA sequencing." Nuc. Acid Res. 10: 
4731-4751; Staden, R. , 1986 "The current status and 
portability of our sequence handling software" . Nuc. Acid 
Res. 14: 217-231; Staden, R. 1982 "An interactive graphics 
program for comparing and aligning nucleic acid and amino 
acid sequence". Nuc. Acid Res. 10: 2951-2961]. Figure 4 
shows the structure of L coli 0157 O antigen gene 
cluster. Twelve open reading frames were predicted from 
the sequence data, and the nucleotide and amino acid 
sequences of all these genes were then used to search the 
GenBank database for indication of possible function and 
specificity of these genes. The position of each gene is 
listed in Table 6. The nucleotide sequence is presented 
in SEQ ID NO: 2 and Figure 8. 

orfs 10 and 11 showed high level identity to manC and 
manB and were named manC and manB respectively. orf7 
showed 89% identity (at amino acid level) to the gmd gene 
of the E^ coli colanic acid capsule gene cluster 
(Stevenson G. , K. et al . 1996 "Organisation of the 
Escherichia coli K-12 gene cluster responsible for 
production of the extracellular polysaccharide colanic 
acid". J. Bacteriol. 178:4885-4893) and was named gmd. 
orf8 showed 79% and 69% identity (at amino acid level) 
respectively to wcaG of the coli colanic acid capsule 
gene cluster and to wbcJ (orf!4.8) gene of the Yersinia 
enterocolitis 08 O antigen gene cluster (Zhang, L. et al . 
1997 "Molecular and chemical characterization of the 
lipopolysaccharide O-antigen and its role in the virulence 
of enterocojj t.ira serotype 08".Mol. Microbiol. 23:63- 
76) . Colanic acid and the Yersinia 08 O antigen both 
contain fucose as does the 0157 O antigen. There are two 
enzymatic steps required for GDP-L- fucose synthesis from 
GDP-4-keto-6-deoxy-D-mannose, the product of the gmd gene 
product. However, it has been shown recently (Tonetti, M 
et al. 1996 Synthesis of GDP-L-fucose by the human FX 
protein J. Biol. Chem. 271:27274-27279) that the human FX 
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protein has "significant homology" with the wcaG gene 
(referred to as Yefb in that paper) , and that the FX 
protein carries out both reactions to convert GDP-4-keto- 
6-deoxy-D-mannose to GDP-L-f ucose . We believe that this 
makes a very strong case for orf8 carrying out these two 
steps and propose to name the gene fcl . In support of the 
one enzyme carrying out both functions is the observation 
that there are no genes other than manB, manC, gmd and fcl 
with similar levels of similarity between the three 
bacterial gene clusters for fucose containing structures . 

orf5 is very similar to wbeE {rfbE) of Vibrio 
cholerae 01, which is thought to be the perosamine 
synthetase, which converts GDP-4 -keto- 6 -deoxy-D-mannose to 
GDP -per o s amine (Stroeher, U.H et al. 1995 "A putative 
pathway for perosamine biosynthesis is the first function 
encoded within the rfb region of Vibrio cholerae " Ol . Gene 
166: 33-42) . V^ cholerae Ol and coli 0157 O antigens 
contain perosamine and N-acetyl -perosamine respectively. 
The 5L cholerae 01 manA, manB, gmd and wbeE genes are the 
only genes of the V^ cholerae Ol gene cluster with 
significant similarity to genes of the coli 0157 gene 
cluster and we believe that our observations both confirm 
the prediction made for the function of wbe of V. 
cholerae , and show that orf5 of the 0157 gene cluster 
encodes GDP-perosamine synthetase. orf5 is therefore 
named per. orf5 plus about lOObp of the upstream region 
(postion 4022-5308)was previously sequenced by Bilge, S.S. 
et al. [1996 "Role of the Escherichia coli 0157-H7 0 side 
chain in adherence and analysis of an rfb locus" . Infect . 
Immun. 64:4795-4801]. 

orfl2 shows high level similarity to the conserved 
region of about 50 amino acids of various members of an 
acetyl transferase family (Lin, W. , et al. 1994 "Sequence 
analysis and molecular characterisation of genes required 
for the biosynthesis of type 1 capsular polysaccharide in 
Staphylococcus aureus ". J. Bateriol. 176: 7005-7016) and 
we believe it is the N-acetyl trans f erase to convert GDP- 
perosamine to GDP-perNAc. orfl2 has been named wbdR. 
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The genes manB, manC, gmd, fcl, per and wbdR account 
for all of" the expected biosynthetic pathway genes of the 
0157 gene cluster. 

The remaining biosynthetic step(s) required are for 
synthesis of UDP-GalNAc from UDP-Glc . It has been 
proposed (Zhang, L . , et al . 1997 "Molecular and chemical 
characterisation of the lipopolysaccharide 0-antigen and 
its role in the virulence of Yersinia enterocolitis 
serotype 08".Mol. Microbiol. 23:63-76) that in Yersinia 
enterocolitica UDP-GalNAc is synthesised from UDP-GlcNAc 
by a homologue of galactose epimerase { GalE ) , for which 
there is a galE like gene in the Yersinia enterocolitica 
08 gene cluster. In the case of 0157 there is no galE 
homologue in the gene cluster and it is. not clear how UDP- 
GalNAc is synthesised. It is possible that the galactose 
epimerase encoded by the galE gene in the gal operon, can 
carry out conversion of UDP-GlcNAc to UDP-GalNAc in 
addition to conversion of UDP-Glc to UDP-Gal . There do 
not appear to be any gene(s) responsible for UDP-GalNAc 
synthesis in the 0157 gene cluster. 

orf4 shows similarity to many wzx genes and is named 
wzx and orf2 which shows similarity of secondary structure 
in the predicted protein to other wzy genes and is for 
that reason named wzy. 

The orfl, orf3 and orf6 gene products all have 
characteristics of transferases, and have been named wbdN, 
wbdO and wbdP respectively. The 0157 O antigen has 4 
sugars and 4 transferases are expected. The first 
transferase to act would put a sugar phosphate onto 
undecaprenol phosphate. The two transferases known to 
perform this function, WbaP (RfbP) and WecA (Rfe) transfer 
galactose phosphate and N- acetyl -glucosamine phosphate 
respectively to undecaprenol phosphate. Neither of these 
sugars is present in the 0157 structure. 

Further, none of the presumptive transferases in the 
0157 gene cluster has the transmembrane segments found in 
WecA and WbaP which transfer a sugar phosphate to 
undecaprenol phosphate and expected for any protein which 
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transferred a sugar to undecaprenol phosphate which is 
embedded within the membrane. 

The WecA gene which transfers GlcNAc-P to 
undecaprenol phosphate is located in the Enterobactereal 
Common Antigen (ECA) gene cluster and it functions in ECA 
synthesis in most and perhaps all E_^ coli strains, and 
also in O antigen synthesis for those strains which have 
GlcNAc as the first sugar in the O unit. 

It appears that WecA acts as the transferase for 
addition of GalNAc-l-P to undecaprenol phosphate for the 
Yersinia enterocolitica 08 O antigen [Zhang et al.1997 
"Molecular and chemical characterisation of the 
lipopolysaccharide 0 antigen and its role in the virulence 
of Yersinia enterocol itica serotype 08" Mol . Microbiol. 
23: 63-76.] and perhaps does so here as the 0157 structure 
includes GalNAc. W&cA has also been reported to add 
Glucose-l-P phosphate to undecapr eno 1 phosphate in coli 
08 and 09 strains, and an alternative possibility for 
transfer of the first sugar to undecaprenol phosphate is 
WecA mediated transfer of glucose, as there is a glucose 
residue in the 0157 0 antigen. In either case the 
requisite number of transferase genes are present if 
GalNAc or Glc is transferred by WecA and the side chain 
Glc is transferred by a transferase outside of the O 
antigen gene cluster. ' : 

or£9 shows high level similarity (44% identity at 
amino acid level, same length) with wcaH gene of the E. 
coli colanic acid capsule gene cluster. The function of 
this gene is unknown, and we give orf9 the name wbdQ. 

The DNA between man& and wdbR has strong sequence 
similarity to one of the H-repeat units of E^ coli K12 . 
Both of the inverted repeat -sequences flanking this region 
are still recognisable, each with two of the 11 bases 
being changed. The H'-repeat associated protein encoding 
gene located within this region has a 267 base deletion 
and mutations in various positions. It seems that the H- 
repeat unit has been associated with this gene cluster for 
a long period of time since it translocated to the gene 
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cluster, perhaps playing a role in assembly of the gene 
cluster as" has been proposed in other cases. 

Materials and Methods - part 4 
5 To test our hypothesis that O antigen genes for 

transferases and the wzx, wzy genes were more specific 
than pathway genes for diagnostic PCR, we first carried 
out PCR using primers for all the coli 016 O antigen 
genes (Table 4) . The PCR was then carried out using PCR 
10 primers for EL. coli Olll transferase, wzx and wzy genes 

(Table 5, 5A) . PCR was also carried out using PCR primers 
for the IL. coli 0157 transferase, wzx and wzy genes (Table 
6, 6A) . 

Chromosomal DNA from the 166 serotypes of EL. coli 
15 available from Statens Serum Institut, 5 Artillerive j , 
2300 Copenhagen Denmark was isolated using the Promega 
Genomic (Madison WI USA) isolation kit. Note that 164 of 
the serogroups are described by Ewing W. H. : Edwards and 
Ewings "Identification of the Enterobacteriacea" Elsevier, 
20 Amsterdam 1986 and that they are numbered 1-171 with 

numbers 31, 47, 67, 72, 93, 94 and 122 no longer valid. 
Of the two serogroup 19 strains we used 19ab strain F8188- 
41. Lior H. 1994 [ "Classif ication of Eschericia coli In 
Eschericia coli in domestic animals and humans pp 31-72. 
25 Edited by C.L. Gyles CAB international] adds two more 
numbered 172 and 173 to give the 166 serogroups used. 
Pools containing 5 to 8 samples of DNA per pool were made. 
Pool numbers 1 to 19 (Table 1) were used in the E. coli 
0111 and 0157 assay. Pool numbers 20 to 28 were also used 
30 in the 0111 assay, and pool numbers 22 to 24 contained E. 
coli 0111 DNA and were used as positive controls (Table 
2) . Pool numbers 29 to 42 were also used in the 0157 
assay, and pool numbers 31 to 36 contained E. coli 0157 
DNA, and were used as positive controls (Table 3) . Pool 
35 numbers 2 to 20, 30, 43 and 44 were used in the Ej_ coli 
016 assay (Tables 1 to 3) . Pool number 44 contained DNA 
of EL. coli K-12 strains C600 and WG1 and was used as a 
positive control as between them they have all of the EL. 
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coli K-12 016 0 antigen genes. 

PCR reactions were carried out under the following 
conditions: denaturing 94°C/30"; annealing, temperature 
varies (refer to Tables 4 to 8)/30"; extension, 72°C/1' ,- 
5 30 cycles. PCR reaction was carried out in an volume of 
25flL for each pool. After the PCR reaction, 10|1L PCR 
product from each pool was run on an agarose gel to check 
for amplified DNA. 

Each coli and SL. enterica chromosomal DNA sample 
10 was checked by gel electrophoresis for the presence of 

chromosomal DNA and by PCR amplification of the E^. coli or 
S . enterica mdh gene using oligonucleotides based on E. 
coli K-12 or Salmonella enterica LT2 [Boyd et al . (1994) 
"Molecular genetic basis of allelic polymorphism in malate 
15 degydrogenase (mdh) in natural populations of Escherichia 
coli and Salmone lla enterica " Proc. Nat. Acad. Sci . USA. 
91:1280-1284.] Chromosomal DNA samples from other 
bacteria were only checked by gel electrophoresis of 
chromosomal DNA. 

20 

A. Primers based on E^ coli 016 O antigen gene cluster 
sequence . 

The O antigen gene cluster of E^ coli 016 was the 
only typical EL coli 0 antigen gene cluster that had been 

25 fully sequenced prior to that of 0111, and we chose it for 
testing our hypothesis. One pair of primers for each gene 
was tested against pools 2 to 20, 30 and 43 of EL. coli 
chromosomal DNA. The primers, annealing temperatures and 
functional information for each gene are listed in Table 

30 4. 

For the five pathway genes, there were 17/21, 13/21, 
0/21, 0/21, 0/21 positive pools for rmlB, rmlD, rmlA, rmlC 
and glf respectively (Table 4) . For the wzx, wzy and 
three transferase genes there were no positives amongst 
35 the 21 pools of coli chromosomal DNA tested (Table 4) . 
In each case the #44 pool gave a positive result. 
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B. Primers based on the coli 0111 O antigen gene 
clsuter sequence. 

One to four pairs of primers for each of the 
transferase, wzx and wzy genes of Olll were tested against 
the pools 1 to 21 of E^. coli chromosomal DNA {Table 5) . 
For wbdH, four pairs of primers, which bind to various 
regions of this gene, were tested and found to be specific 
for Olll as there was no amplified DNA of the correct size 
in any of those 21 pools of EL. coli chromosomal DNA 
tested. Three pairs of primers for wbdM were tested, and 
they are all specific although primers #985/#986 produced 
a band of the wrong size from one pool. Three pairs of 
primers for wzx were tested and they all were specific. 
Two pairs of primers Were tested for wzy, both are 
specific although #980/#983 gave a band of the wrong size 
in all pools . One pair of primers for wbdL was tested and 
found unspecific and therefore no further test was carried 
out. Thus, wzx, wzy and two of the three transferase 
genes are highly specific to Olll. Bands of the wrong 
size found in amplified DNA are assumed to be due to 
chance hybridisation of genes widely present in EL. coli . 
The primers, annealing temperatures and positions for each 
gene are in (Table 5) . 

The 0111 assay Was also performed using pools 
including DNA from O antigen expressing Yersinia 
pseudotuberculosis , Shigella bovdii and Salmonella 
enterica strains (Table 5AY . None of the oligonucleotides 
derived from wbdH, wzx, wzy or wbdM gave amplified DNA of 
the correct size with these pools. Notably, pool number 
0 25 includes S. enterica 1 Adelaide which has the same O 

antigen as E . coli 0111: this pool did not give a positive 
PCR result for any primers tested indicating that these 
genes are highly specific for E.. coli Olll. 

Each of the i2 pairs binding to wbdH, wzx, wzy and 
5 wbdM produces a 6&a^ )y 6i^^ ! ^^ecl ' size with the pools 

containing 0111 DNA (pools* number 22 to 24). As pools 22 
to 24 included DNA from all strains present in pool 21 
plus 0111 strain DNA (Table 2), we conclude that the 12 
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pairs of primers all give a positive PCR test with each of 
three unrelated 0111 strains but not with any other 
strains tested. Thus these genes are highly specific for 
EL. coli 0111. 

C. Primers based on the SL_ coli 0157 0 antigen gene 
cluster sequence. 

Two or three primer pairs for each of the 
transferase, wzx and wzy genes of 0157 were tested against 
IL. coli chromosomal DNA of pools 1 to 19, 29 and 30 {Table 
6) . For wbdN, three pairs of primers, which bind to 
various regions of this gene, were tested and found to be 
specific for 0157 as there was no amplified DNA in any of 
those 21 pools of 1L_ coli chromosomal DNA tested. Three 
pairs of primers for wbdO were tested, and they are all 
specific although primers # 12 11/ #12 12 produced two or 
three bands of the wrong size from all pools. Three pairs 
of primers were tested for wbdP and they all were 
specific. Two pairs of primers were tested for wbdR and 
they were all specific. For wzy, three pairs of primers 
were tested and all were specific although primer pair 
#1203/#1204 produced one or three bands of the wrong size 
in each pool. For wzx, two pairs of primers were tested 
and both were specific although primer pair #1217/#1218 
produced 2 bands of wrong size in 2 pools, and 1 band of 
wrong size in 7 pools. Bands of the wrong size found in 
amplified DNA are assumed to be due to chance 
hybridisation of genes widely present in EL_ coli . The 
primers, annealing temperatures and function information 
for each gene are in Table 6 . 

The 0157 assay was also performed using pools 37 to 
42, including DNA from 0 antigen expressing Yersinia 
pseudo tuberculosa g r Shigella bovdii . Yersinia 
enterocolitis 09, Brucella abortus and Salmonella 
enterjca strains (Table 6A) . None of the oligonucleotides 
derived from wbdN, wzy, wbdO, wzx, wbdP or wbdR reacted 
specifically with these pools, except that primer pair 
#12 03/#1204 produced two bands with Y_^ enterocolitica 09 
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and one of the bands is of the same size with that from 
the positive control. Primer pair #1203/#1204 binds to 
wzy. The predicted secondary structures of Wzy proteins 
are generally similar, although there is very low 
similarity at amino acid or DNA level among the sequenced 
wzy genes. Thus, it is possible that L enterocolitis 
09 has a wzy gene closely related to that of coli 0157 . 
It is also possible that this band is due to chance 
hybridization of another gene, as the other two wzy primer 
pairs <#1205/#1206 and #1207/#1208) did not produce any 
band with enterocolitica 09. Notably, pool number 37 
includes enterica Landau which has the same 0 antigen 
as E_=_ coli 0157, and pool 38 and 39 contain DNA of B. 
abortus and enterocolitica 09 which cross react 
serologically with Ej_ coli 0157. This result indicates 
that these genes are highly 0157 specific, although one 
primer pair may have cross reacted with enterocolitica 
09. 

Each of the 16 pairs binding to wbdN, wzx, wzy, wbdO, 
wbdP and wbdR produces a band of predicted size with the 
pools containing 0157 DNA (pools number 31 to 36) . As 
pool 29 included DNA from all strains present in pools 31 
to 36 other than 0157 strain DNA (Table 3 ) , we conclude 
that the 16 pairs of primers all give a positive PCR test 
with each of the five unrelated 0157 strains. 

Thus PCR using primers based on genes wbdN, wzy, 
wbdO, wzx, wbdP and wbdR is highly specific for EL. coli 
0157, giving positive results with each of six unrelated 
0157 strains while only one primer pair gave a band of the 
expected size with one of three strains with O antigens 
known to cross-react serologically with EL. coli 0157 . 



D. Primers based on the Salmonella enterica sprnhypp C2 
and B O antigen gene cluster sequences . 

We also performed a PCR using primers for the S. 
enterica C2 and B serogroup transferases, wzx, wzy and 
genes (Tables 7 to 9) . The nucleotide sequences of C2 
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and B O antigen gene clusters are listed as SEQ ID NO: 3 
(Fig. 9) and SEQ ID NO: 4 (Fig. 10 ) respectively . 
Chromosomal DNA from all the 46 serotypes of Salmonella 
enterica (Table 9) was isolated using the Promega Genomic 
5 isolation kit, 7 pools of 4 to 8 samples per pool were 
made. Salmonella enterica serotype B or C2 DNA was 
omitted from the pool for testing primers of 46 respective 
serotypes but added to a pool containing 6 other samples 
to give pool number 8 for use as a positive control. 
10 PCR reactions were carried out under the following 

conditions: denaturing, 94°C/30"; annealing, temperature 
varies (see below) /30"; extension, 72°C/1'; 30 cycles. 
PCR reaction was carried out in a volume of 25fiL for each 
pool. After the PCR reaction, 10UL PCR product from each 
15 pool was run on an agarose gel to check for amplified DNA. 
For pools which gave a band of correct size, PCR was 
repeated using individual chromosomal samples of that 
pool, and agarose gel was run to check for amplified DNA 
from each sample . 
20 The Salmonella enterica serotype B O antigen gene 

cluster (of strain LT2) was the first O antigen gene 
cluster to be fully sequenced, and the function of each 
gene has been identified experimentally [Jiang, X. M. , 
Neal , B . , Santiago , F . , Lee , S . J. , Romana , L . K . , and 
25 Reeves, P. R. (1991) "Structure and sequence of the rfb (O 
antigen) gene cluster of Salmonella serovar typhimurium 
(strain LT2)." Mol. Microbiol. 5(3), 695-713; Liu, D. , 
Cole, R. , and Reeves, P. R. (1996). "An O antigen 
processing function for Wzx(RfbX) : a promising candidate 
30 for O-unit flippase" J. Bacteriol . , 178 (7) , 2102-2107 ,- Liu, 
D. , Haase, A. M. , Lindqvist, L., Lindberg, A. A., and 
Reeves, P. R. (1993) . "Glycosyl transferases of O-antigen 
biosynthesis in S. enterica • identification and 
characterisation of transferase genes of groups B, C2 and 
35 El." J. Bacteriol., 175, 3408-3413; Liu, D. , Lindquist, 
L. , and Reeves P. R. (1995) . "Transferases of O-antigen 
biosynthesis in Salmonella enterica: dideoxhexosyl 
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transferases of groups B and C2 and acetyl transferase of 
group C2 . " J. Bacterid. / 177, 4084-4088; Romana, L. K. , 
Santiago, F. S., and Reeves, P. R. (1991). "High level 
expression and purification dThymidine-diphospho-D-glucose 
4,6 dehydratase (rfbB) from Salmonella, serovar typhimuri urn 
LT2." BBRC, 174, 846-852]. One pair of primers for each 
of the pathway genes and wbaP was tested against the pools 
of Salmone lla enterica DNA, two to three pairs of primers 
for each of the other transferases and wzx genes were also 
tested. See Table 8 for a list of primers and functional 
information of each gene, as well as the annealing 
temperature of the PCR reaction for each pair of primers . 

For pathway genes of group B strain LT2 , there are 
19/45, 14/45, 15/45, 12/45, 6/45, 6/45, 6/45, 6/45, 1/45, 
9/45, 8/45 positives for rmlB, rmlD, rmlA, rmlC, ddhD, 
ddhA, ddhB, ddhC, abe, manC, and manB repsectively (Table 
9) . 

For the LT2 wzx gene we used three primer pairs each 
of which gave 1/45 positive. For the 4 transferase genes 
we used a total of 9 primer pairs . 2 primer pairs for 
wbaV gave 2/90 positives. For 3 primer pairs of wbaN, 
11/135 gave a positive result. For the wbaP primer pair 
10/45 gave a positive result (Table 9) . 

The experimental data show that oligonucleotides 
derived from the wzx and wbaV group B O antigen genes are 
specific for group B O antigen amongst all 45 Salmonella 
enterica O antigen groups except 0 group 67 . The 
oligonucleotides derived from Salmonella enterica B group 
wbaN and wbaU genes detected B group O antigen and also 
produced positive results with groups A, Dl and D3 . WbaU 
encodes a transferase for a Mannose a (1-4) Mannose linkage 
and is expressed in groups A, B and Dl while wbaN, which 
encodes a transferase for Rhamnose a (1-3) Galactose 
linkage is present in groups A, B, Dl, D2, D3 and El. 
This accounts for the positive results with the group B 
wbaU and wbaN genes . The wbaN gene of groups E and D2 has 
considerable sequence differences from that of groups A, 
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B, Dl and D3 and this accounts for the positive results 
only with groups B, Dl and D3 . 

The Salmonella enterica B primers derived from wzx 
and transferase genes produced a positive result with 
Salmonella enterica 067. We find that Salmonella enterica 
067 has all the genes of the group B 0 antigen cluster. 
There are several possible explanations for this finding 
including the possibility that the gene cluster is not 
functional due to mutation and the group 067 antigenicity 
is due to another antigen, or the O antigen is modified 
after synthesis such that its antigenicity is changed. 
Salmonella enterica 067 would therefore be scored as 
Salmonella enterica group B in the PCR diagnostic assay. 
However, this is of little importance because Salmonella 
enterica 067 is a rare O antigen and only one (serovar 
Crossness) of the 2324 known serovars has the 067 
serotype [Popoff M.Y. et al (1992) "Antigenic formulas of 
the Salmonella enterica serovars" 6th revision WHO 
Collaborating Centre for Reference and Research on 
Salmonella enterica , Institut Pasteur Paris France] , and 
serovar Crossness had only been isolated once [M. Popoff, 
personal communication] . 

The Salmonella enterica B primers derived from wbaP 
reacted with group A, C2, Dl, D2, D3 , El, 54, 55, 67 and 
E4 O antigen groups. WbaP encodes the galactosyl 
transferase which initiates 0 unit synthesis by transfer 
of Galactose phosphate to the lipid carrier Undecaprenol 
phosphate. This reaction is common to the synthesis of 
several O antigens. As such wbaP is distinguished from 
0 other transferases of the invention as it does not make a 
linkage within an 0 antigen. 

We also tested 20 primer pairs for the wzx, wzy and 5 
transferase genes of serotype C2 and found no positives in 
all the 7 pools (Table 7) . 
5 Groups A, B, Dl, D2 , D3 , C2 and El share many genes 

in common. Some of these genes occur with more than one 
sequence in which case each specific sequence can be named 
after one of the serogroups in which it occurs . The 
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distribution of these sequence specificities is shown in 
Table 10 . The inventors have aligned the nucleotide 
sequences of Salmonella enterica wzy, wzx genes and 
transferase genes so as to determine specific combinations 

5 of nucleic acid molecules which can be employed to 

specifically detect and identify the Salmonella enterica 
groups A, B, Dl, D2 , D3 , C2 and El (Table 10). The 
results show that many of the O antigen groups can be 
detected and identified using a single specific nucleic 

10 acid molecule although other groups in particular D2 and 

El, and A and Dl require a panel of nucleic acid molecules 
derived from a combination of genes . 

It will be understood that in carrying out the 
methods of the invention with respect to the testing of 

15 particular sample types including samples from food, 

patients and faeces the samples are prepared by routine 
techniques routinely used in the preparation of such 
samples for DNA based testing. 
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TABLE 1 

Strains of which chromosonal DNA included in the pool Source* 

1 E. coli type strains for O serotypes 1, 2, 3, 4, 10, 16, 18 and 39 IMVS a 

2 E. coli type strains for O serotypes 40, 41, 48, 49, 71, 73, 88 and 100 IMVS 

3 E. coli type strains for O serotypes 1 102, 109, 119, 120, 121, 125, 126 and IMVS 
137 

4 E. coli type strains for O serotypes 138, 139, 149, 7, 5, 6, 11 and 12 IMVS 

5 E. coli type strains for O serotypes 13, 14, 15, 17, 19ab, 20, 21 and 22 IMVS 

6 E. coli type strains for O serotypes 23, 24, 25, 26, 27, 28, 29 and 30 IMVS 

7 E. coli type strains for O serotypes 32, 33, 34, 35, 36, 37, 38 and 42 IMVS 

8 E. coli type strains for O serotypes 43, 44, 45, 46, 50, 51, 52 and 53 IMVS 

9 E. coli type strains for O serotypes 54, 55, 56, 57, 58, 59, 60 and 61 IMVS 

10 E. coli type strains for O serotypes 62, 63, 64, 65, 66, 68, 69 and 70 IMVS 

1 1 E. coli type strains for O serotypes^, 75, 76, 77, 78, 79, 80 and 81 IMVS 

12 E. coli type strains for O serotypes 82, 83, 84, 85, 86, 87, 89 and 90 IMVS 

13 E. coli type strains for O serotypes 91, 92, 95, 96, 97, 98, 99 and 101 IMVS 

14 E. coli type strains for O serotypes 103, 104, 105, 106, 107, 108 and 110 IMVS 

15 E. coli type strains for O serotypes 112, 162, 113, 114, 115, 116, 117 and IMVS 
118 

16 E. coli type strains for O serotypes 123, 165, 166, 167, 168, 169, 170 and See b 
171 

17 E. coli type strains for O serotypes 172, 173, 127, 128, 129, 130, 131 and See c 
132 

18 E. coli type strains for O serotypes 133, 134, 135, 136, 140, 141, 142 and IMVS 
143 

19 E. coU type strains for O serotypes 144, 145, 146, 147, 148, 150, 151 and IMVS 
152 



* 

a. Institute of Medical and Veterinary Science, Adelaide, Australia 

b. 123 from IMVS; the rest from Statens Serum Institut, Copenhagen, Denmark 

c. 172 and 173 from Statens Serum Institut, Copenhagen, Denmark, the rest from IMVS 
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TABLE 2 



Pool 
No. 


Strains of which chromosonal DNA included in the pool 


Source* 


20 


E. coli type strains for O serotypes 153, 154, 155, 156, 157, 158 , 159 and 
160 


IMVS 


21 


E. coli type strains for O serotypes 161, 163, 164, 8, 9 and 124 


IMVS 


22 


As pool #21, plus E. coli 0111 type strain Stoke W. 


IMVS 


23 


As pool #21, plus E. coli 0111:H2 strain C1250-1991 


See d 


24 


As pool #21, plus E. coli 0111:H12 strain C156-1989 


See e 


25 


As pool #21, plus S. emerica serovar Adelaide 


See f 


26 


Y. pseudotuberculosis strains of O groups LA, IIA, LIB, LIC, III, IVA IVB 
VA, VB, VI and VII 


Seeg 


27 


S. boydii strains of serogroups 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14 and 15 


Seeh 


28 


S. emerica strains of serovars (each representing a different O group) Typhi, 
Montevideo, Ferruch, Jangwani, Raus, Hvittingfoss, Waycross, Dan, 
Dugbe, Basel, 65,:i:e,n,z,15 and 52:d:e,n,x,zl5 


IMVS 


d. 
e. 
f. 


C 1250- 1991 from Statens Serum Institut, Copenhagen, Denmark 
C 156-1989 from Statens Serum Institut, Copenhagen, Denmark 
S. emerica serovar Adelaide from IMVS 





g. Dr S Aleksic of Institute of Hygiene, Germany 

h. Dr J Lefebvre of Bacterial Identification Section, Laboratoroie de Sante Publique du 
Quebec, Canada 
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No Strains of which chromosonal DNA included in the pool Source* 

29 E. coli type strains for O serotypes 153, 154, 155, 156, 158, 159 and 160 " IMVS 

30 E. coli type strains for O serotypes 161, 163, 164, 8, 9, 111 and 124 IMVS 

31 As pool #29, plus E. coli 0157 type strain A2 (0157:H19) IMVS 

32 As pool #29, plus E. coli 0157.H16 strain C475-89 See d 

33 As pool #29, plus E. coli 0157:H45 strain C727-89 See d 

34 As pool #29, plus E. coli 0157-.H2 strain C252-94 See d 

35 As pool #29, plus E. coli 0157:H39 strain C258-94 See d 

36 As pool #29, plus E. coli 0157:H26 See e 

37 As pool #29, plus S. enterica serovar Landau See f 

38 As pool #29, plus Brucella abortus See g 

39 As pool #29, plus Y. enterocolitica 09 See h 

40 Y. pseudotuberculosis strains of O groups IA, IIA, IIB, IIC, in, IV A, IVB, VA, See i 
VB, VI and VII 

41 S. boydii strains of serogroups 1, 3, 4, 5, 6, 8, 9, 10, 11, 12, 14 and 15 See j 

42 S. enterica strains of serovars (each representing a different O group) Typhi, IMVS 
Montevideo, Ferruch, Jangwani, Raus, Hvittingfoss, Waycross, Dan, Dugbe, 

Basel, 65:i:e,n,zl5 and 52:d:e,n,x,zl5 

43 E. coli type strains for O serotypes 1,2,3,4,10,18 and 29 IMVS 

44 As pool #43, plus E. coli K-12 strains C600 and WG1 IVMS 
Seek 



d. 0157 strains from Statens Serum Institut, Copenhagen, Denmark 

e. 0157-.H26 from Dr R Brown of Royal Children's Hospital, Melbourne, Victoria 

f . 5. enterica serovar Landau from Dr M Poppoff of Institut Pasteur, Paris, France 

g. B. Abortus from the culture collection of The University of Sydney, Sydney, Australia 

h. Y. enterocolitica 09 from Dr. K. Bettelheim of Victorian Infectious Diseases Reference 
Laboratory Victoria, Australia. 

i. Dr S Aleksic of Institute of Hygiene, Germany 

J . Dr J Lefebvre of Bacterial Identification Section, Laboratoroie de Sante Publique du 
Quebec, Canada 

k. Strains C600 and WG1 from Dr. B.J. Backmann of Department of Biology, Yale 
University, USA. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: Reeves, Peter R 
Wang, Lei 

(ii) TITLE OF INVENTION: Nucleic Acid Molecules Specific For 
Bacterial Antigens And Uses Thereof 

(iii) NUMBER OF SEQUENCES: 4 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Thomas Gumley 

(B) STREET: 168 Walker Street 

(C) CITY: North Sydney 

(D) STATE: New South Wales 

(E) COUNTRY: Australia 

(F) ZIP: 2068 



(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS /MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 



(vi) CURRENT APPLICATION DATA: 
(A) APPLICATION NUMBER: 
(B> FILING DATE: 
(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 
(A) NAME: Gumley, Thomas P 

<ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 99575944 

(B) TELEFAX: 99576288 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14516 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY-, linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: YES 



(v) ORIGINAL SOURCE: 

(A) ORGANISM: Escherichia coli 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

GATCTGATGG CCGTAGGGCG CTACGTGCTT TCTGCTGATA TCTGGGCTGA GTTGGAAAAA 60 

ACTGCTCCAG GTGCCTGGGG ACGTATTCAA CTGACTGATG CTATTGCAGA GTTGGCTAAA 120 

AAACAGTCTG TTGATGCCAT GCTGATGACC GGCGACAGCT ACGACTGCGG TAAGAAGATG 180 
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GGCTATATGC AGGCATTCGT TAAGTATGGG CTGCGCAACC TTAAAGAAGG GGCGAAGTTC 240 

CGTAAGAGCA TCAAGAAGCT ACTGAGTGAG TAGAGATTTA CACGTCTTTG TGACGATAAG 300 

CCAGAAAAAA TAGCGGCAGT TAACATCCAG GCTTCTATGC TTTAAGCAAT GGAATGTTAC 360 

TGCCGTTTTT TATGAAAAAT GACCAATAAT AACAAGTTAA CCTACCAAGT TTAATCTGCT 420 

TTTTGTTGGA TTTTTTCTTG TTTCTGGTCG CATTTGGTAA GACAATTAGC GTGAGTTTTA 480 

GAGAGTTTTG CGGGATCTCG CGGAAGTGCT CACATCTTTG GCATTTAGTT AGTGCACTGG 540 

TAGCTGTTAA GCCAGGGGCG GTAdCTTGCC TAATTAATTT TTAACGTATA CATTTATTCT 600 

TGCCGCTTAT AGCAAATAAA GTCAATCGGA TTAAACTTCT TTTCCATTAG GTAAAAGAGT 660 

GTTTGTAGTC GCTCAGGGAA ATTGGTTTTG GTAGTAGTAC TTTTCAAATT ATCCATTTTC 720 

CGATTTAGAT GGCAGTTGAT GTTACTATGC TGCATACATA TCAATGTATA TTATTTACTT 780 

TTAGAATGTG ATATGAAAAA AATAGTGATC ATAGGCAATG TAGCGTCAAT GATGTTAAGG 840 

TTCAGGAAAG AATTAATCAT GAATTTAGTG AGGCAAGGTG ATAATGTATA TTGTCTAGCA 900 

AATGATTTTT CCACTGAAGA TCTTAAAGTA CTTTCGTCAT GGGGCGTTAA GGGGGTTAAA 960 

TTCTCTCTTA ACTCAAAGGG TATTAATCCT TTTAAGGATA TAATTGCTGT TTATGAACTA 1020 

AAAAAAATTC TTAAGGATAT TTCC<X3NGAT ATTGTATTTT CATATTTTGT AAAGCCAGTA 1080 

ATATTTGGAA CTATTGCTTC AAAGTTGTCA AAAGTGCCAA GGATTGTTGG AATGATTGAA 1140 

GGTCTAGGTA ATGCCTTCAC TTATTATAAG GGAAAGCAGA CCACAAAAAC TAAAATGATA 1200 

AAGTGGATAC AAATTCTTTT ATATAAGTTA GCATTACCGA TGCTTGATGA TTTGATTCTA 1260 

TTAAATCATG ATGATAAAAA AGATTTAATC GATCAGTATA ATATTAAAGC TAAGGTAACA 1320 

GTGTTAGGTG GGATTGGATT GGATCTTAAT GAGTTTTCAT ATAAAGAGCC ACCGAAAGAG 1380 

AAAATTACCT TTATTTTTAT AGCAAGGTTA TTAAGAGAGA AAGGGATATT TGAGTTTATT 1440 

GAAGCCGCAA AGTTCGTTAA GACAACTTAT CCAAGTTCTG AATTTGTAAT TTTAGGAGGT 1500 

TTTGAGAGTA ATAATCCTTT CTCATTACAA AAAAATGAAA TTGAATCGCT AAGAAAAGAA 1560 

CATGATCTTA TTTATCCTGG TCATGTGGAA AATGTTCAAG ATTGGTTAGA GAAAAGTTCT 1620 

GTTTTTGTTT TACCTACATC ATATCGAGAA GGCGTACCAA GGGTGATCCA AGAAGCTATG 1680 

GCTATTGGTA GACCTGTAAT AACAACTAAT GTACCTGGGT GTAGGGATAT AATAAATGAT 1740 

GGGGTCAATG GCTTTTTGAT ACCTCCATTT GAAATTAATT TACTGGCAGA AAAAATGAAA 1800 

TATTTTATTG AGAATAAAGA TAAAGTACTC GAAATGGGGC TTGCTGGAAG GAAGTTTGCA 1860 

GAAAAAAACT TTGATGCTTT TGAAAAAAAT AATAGACTAG CATCAATAAT AAAATCAAAT 1920 

AATGATTTTT GACTTGAGCA GAAATTATTT ATATTTCAAT CTGAAAAATA AAGGCTGTTA 1980 

TTATGAATAA AGTGGCATTA ATTAC?T^STA ^CACTGGSGA AGATGGCTCC TATTTGGCAG 2040 

AATTATTGTT AGSVAAAB^f^TA^S^^ ACGCCGTGCA TCTTCATTTA 2100 

ATACTGAGCG AGTGGATCAC ATCTATCAGG ATTCACATTT AGCTAATCCT AAACTTTTTC 2160 

TACACTATGG CGATTTGACA GATACTTCCA ATCTGACCCG TATTTTAAAA GAAGTTCAAC 2220 
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CAGATGAAGT TTACAATTTG GGGGCGATGA . GCCATGTAGC GGTATCATTT GAGTCACCAG 2280 

AATACACTGC TGATGTTGAT GCGATAGGAA CATTGCGTCT TCTTGAAGCT ATCAGGATAT 2340 

TGGGGCTGGA AAAAAAGACA AAATTTTATC AGGCTTCAAC TTCAGAGCTT TATGGTTTGG 2400 

TTCAAGAAAT TCCACAAAAA GAGACTACGC CATTTTATCC ACGTTCGCCT TATGCTGTTG 2460 

CAAAATTATA TGCCTATTGG ATCACTGTTA ATTATCGTGA GTCTTATGGT ATGTTTGCCT 2520 

GCAATGGTAT TCTCTTTAAC CACGAATCAC CTCG CCGTGG CGAGACCTTT GTTACTCGTA 2580 

AAATAACACG CGGGATAGCA AATATTGCTC AAGGTCTTGA TAAATGCTTA TACTTGGGAA 264 0 

ATATGGATTC TCTGCGTGAT TGGGGACATG CTAAGGATTA TGTCAAAATG CAATGGATGA 2700 

TGCTGCAGCA AGAAACTCCA GAAGATTTTG TAATTGCTAC AGGAATTCAA TATTCTGTCC 276 0 

GTGAGTTTGT CACAATGGCG GCAGAGCAAG TAGGCATAGA GTTAGCATTT GAAGGTGAGG 282 0 

GAGTAAATGA AAAAGGTGTT GTTGTTTCGG TCAATGGCAC TGATGCTAAA GCTGTAAACC 2880 

CGGGCGATGT AATTATATCT GTAGATCCAA GGTATTTTAG GCCTGCAGAA GTTGAAACCT 2940 

TGCTTGGCGA TCCTACTAAT GCGCATAAAA AATTAGGATG GAGCCCTGAA ATTACATTGC 3000 

GTGAAATGGT AAAAGAAATG GTTTCCAGCG ATTTAGCAAT AGCGAAAAAG AACGTCTTGC 3060 

TGAAAGCTAA TAACATTGCC ACTAATATTC CGCAAGAATA AAAAAGATAA TACATTAAAT 3120 

AATTAAAAAT GGTGCTAGAT TTATTAGTAC CATTATTTTT TTTTGGGTGA CTAATGTTTA 3180 

TTACATCAGA TAAATTTAGA GAAATTATCA AGTTAGTTCC ATTAGTATCA ATTGATCTGC 3240 

TAATTGAAAA CGAGAATGGT GAATATTTAT TTGGTCTTAG GAATAATCGA CCGGCCAAAA 3300 

ATTATTTTTT TGTTCCAGGT GGTAGGATTC GCAAAAATGA ATCTATTAAA AATGCTTTTA 3360 

AAAGAATATC ATCTATGGAA TTAGGTAAAG AGTATGGTAT TTCAGGAAGT GTTTTTAATG 3420 

GTGTATGGGA ACATTTCTAT GATGATGGTT TTTTTTCTGA AGGCGAGGCA ACACATTATA 3480 

TAGTGCTTTG TTACACACTG AAAGTTCTTA AAAGTGAATT GAATCTCCCA GATGATCAAC 3540 

ATCGTGAATA CCTTTGGCTA ACTAAACACC AAATAAATGC TAAACAAGAT GTTCATAACT 36 00 

ATTCAAAAAA TTATTTTTTG TAATTTTTAT TAAAAATTAA TATGCGAGAG AATTGTATGT 3660 

CTCAATGTCT TTACCCTGTA ATTATTGCCG GAGGAACCGG AAGCCGTCTA TGGCCGTTGT 3720 

CTCGAGTATT ATACCCTAAA CAATTTTTAA ATTTAGTTGG GGATTCTACA ATGTTGCAAA 3780 

CAACAATTAC GCGTTTGGAT GGCATCGAAT GCGAAAATCC AATTGTTATC TGCAATGAAG 3840 

ATCACCGATT TATTGTAGCA GAGCAATTAC GACAGATTGG TAAGCTAACC AAGAATATTA 3900 

TACTTGAGCC GAAAGGCCGT AATACTGCAC CTGCCATAGC TTTAGCTGCT TTTATCGCTC 3960 

AGAAGAATAA TCCTAATGAC GACCCTTTAT TATTAGTACT TGCGGCAGAC CACTCTATAA 4020 

ATAATGAAAA AGCATTTCGA GAGTCAATAA TAAAAGCTAT GCCGTATGCA ACTTCTGGGA 4080 

AGTTAGTAAC ATTTGGAATT ATTCCGGACA CGGCAAATAC TGGTTATGGA TATATTAAGA 4140 

GAAGTTCTTC AGCTGATCCT AATAAAGAAT TCCCAGCATA TAATGTTGCG GAGTTTGTAG 4200 

AAAAACCAGA TGTTAAAACA GCACAGGAAT ATATTTCGAG TGGGAATTAT TACTGGAATA 4260 
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GCGGAATGTT TTTATTTCGC GCCAGTAAAT A.TCTTGATGA ACTACGGAAA TTTAGACCAG 4320 

ATATTTATCA TAGCTGTGAA TGTGCAACCG CTACAGCAAA TATAGATATG GACTTTGTCC 4380 

GAATTAACGA GGCTGAGTTT ATTAATTGTC CTGAAGAGTC TATCGATTAT GCTGTGATGG 4440 

AAAAAA.CAAA AGACGCTGTA GTTCTTCCGA TAGATATTGG CTGGAATGAC GTGGGTTCTT 4500 

GGTCATCACT TTGGGATATA AGCCAAAAGG ATTGCCATGG TAATGTGTGC CATGGGGATG 4560 

TGCTCAATCA TGATGGAGAA AATAGTTTTA TTTACTCTGA GTCAAGTCTG GTTGCGACAG 4620 

TCGGAGTAAG TAATTTAGTA ATTGTCCAAA CCAAGGATGC TGTACTGGTT GCGGACCGTG 4680 

ATAAAGTCCA AAATGTTAAA AACATAGTTG ACGATCTAAA AAAGAGAAAA CGTGCTGAAT 4740 

ACTACATGCA TCGTGCAGTT TTTCGCCCTT GGGGTAAATT CGATGCAATA GACCAAGGCG 4800 

ATAGATATAG AGTAAAAAAA ATAATAGTTA AACCAGGAGA AGGGTTAGAT TTAAGGATGC 4860 

ATCATCATAG GGCAGAGCAT TGGATTGTTG TATCCGGTAC TGCTAAAGTT TCACTAGGTA 4920 

GTGAAGTTAA ACTATTAGTT TCTAATGAGT CTATATATAT CCCTCAGGGA . GCAAAATATA 4980 

GTCTTGAGAA TCCAGGCGTA ATACCTTTGC ATCTAATTGA AGTAAGTTCT GGTGATTACC 5040 

TTGAATCAGA TGATATAGTG CGTTTTACTG ACAGATATAA CAGTAAACAA TTCCTAAAGC 5100 

GAGATTGATA AATATGAATA AAATAACTTG CTTCAAAGCA TATGATATAC GTGGGCGTCT 5X60 

TGGTGCTGAA TTGAATGATG AAATAGCATA TAGAATTGGT CGCGCTTATG GTGAGTTTTT 5220 

TAAACCTCAA ACTGTAGTTG TGGGAGGAGA TGCTCGCTTA ACAAGTGAGA GTTTAAAGAA 5280 

ATCACTCTCA AATGGGCTAT GTGATGCAGG CGTAAATGTC TTAGATCTTG GAATGTGTGG 5340 

TACTGAAGAG ATATATTTTT CCACTTGGTA TTTAGGAATT GATGGTGGAA TCGAGGTAAC 5400 

TGCAAGCCAT AATCCAATTG ATTATAATGG AATGAAATTA GTAACCAAAG GTGCTCGACC 5460 

AATCAGCAGT GACACAGGTC TCAAAGATAT ACAACAATTA GTAGAGAGTA ATAATTTTGA 5520 

AGAGCTCAAC CTAGAAAAAA AAGGGAATAT TACCAAATAT TCCACCCGAG ATGCCTACAT 5580 

AAATCATTTG ATGGGCTATG CTAATCTGCA AAAAATAAAA AAAATCAAAA TAGTTGTGAA 5640 

TTCTGGGAAT GGTGCAGCTG GTCCTGTTAT TGATGCTATT GAGGAATGCT TTTTACGGAA 5700 

CAATATTCCG ATTCAGTTTG TAAAAATAAA TAATACACCC GATGGTAATT TTCCACATGG 5760 

TATCCCTAAT CCATTACTAC CTGAGTGCAG AGAAGATACC AGCAGTGCGG TTATAAGACA 5820 

TAGTGCTGAT TTTGGTATTG CATTTGATGG TGATTTTGAT AGGTGTTTTT TCTTTGATGA 5880 

AAATGGACAA TTTATTGAAG GATACTACAT TGTTGGTTTA TTAGCGGAAG TTTTTTTAGG 5940 

GAAATATCCA AACGCAAAAA TCATTCATGA TCCTCGCCTT ATATGGAATA CTATTGATAT 6000 

CGTAGAAAGT CATGGTGGTA TACCTATAAT GACTAAAACC GGTCATGCTT ACATTAAGCA 6060 

AAGAATGCGT GAAGAGGATG CCGTATATGG CGGCGAAATG AGTGCGCATC ATTATTTTAA 6120 

AGATTTTGCA TACTGCGATA GTGGAATGAT TCCTTGGATT TTAATTTGTG AACTTTTGAG 6180 

TCTGACAAAT AAAAAATTAG GTGAACTGGT TTGTGGTTGT ATAAACGACT GGCCGGCAAG 6240 

TGGAGAAATA AACTGTACAC TAGACAATCC GCAAAATGAA ATAGATAAAT TATTTAATCG S3 00 
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CACTGATGGA 


TTAACTATGG 


AGTTCTCTGA 


6360 


TTGGCGTTTT 


AATGTTAGAT 


GCTCAAATAC 


AGAACCTGTA 


GTACGATTGA 


ATGTAGAATC 


6420 


TAGGAATAAT 


GCTATTCTTA 


TGCAGGAAAA 


AACAGAAGAA 


ATTCTGAATT 


TTATATCAAA 


6480 


ATAAATTTGC 


ACCTGAGTTC 


ATAATGGGAA 


CAAGAAATAT 


ATGAAAGTAC 


TTCTGACTGG 


6540 


CTCAACTGGC 


ATGGTTGGTA 


AGAATATATT 


AGAGCATGAT 


AGTGCAAGTA 


AATATAATAT 


6600 


ACTTACTCCA ACCAGCTCTG 


ATTTGAATTT 


ATTAGATAAA 


AATGAAATAG 


AAAAATTCAT 


6660 


GCTTATCAAC 


ATGCCAGACT 


GTATTATACA 


TGCAGCGGGA 


TTAGTTGGAG 


GCATTCATGC 


6720 


AAATATAAGC AGGCCGTTTG 


ATTTTCTGGA 


AAAAAATTTG 


CAGATGGG1T 


TAAATTTAGT 


6780 


TTCCGTCGCA 


AAAAAACTAG 


GTATCAAGAA 


AGTGCTTAAC 


TTGGGTAGTT 


CATGCATGTA 


6840 


CCCCAAAAAC 


TTTGAAGAGG 


CTATTCCTGA 


GAAAGCTCTG 


TTAACTGGTG 


AGCTAGAAGA 


6900 


AACTAATGAG 


GGATATGCTA 


TTGCGAAAAT 


TGCTGTAGCA 


AAAGCATGCG 


AATATATATC 


6960 


AAGAGAAAAC 


TCTAATTATT 


TTTATAAAAC 


AATTATCCCA 


TGTAATTTAT 


ATGGGAAATA ' 


7020 


TGATAAATTT 


GATGATAACT 


CGTCACATAT 


GATTCCGGCA 


GTTATAAAAA 


AAATCCATCA 


7080 


TGCGAAAATT 


AATAATGTCC 


CAGAGATCGA 


AATTTGGGGG 


GATGGTAATT 


CGCGCCGTGA 


7140 


GTTTATGTAT 


GCAGAAGATT 


TAGCTGATCT 


TATTTTTTAT 


GTTATTCCTA 


AAATAGAATT 


7200 


CATGCCTAAT 


ATGGTAAATG 


CTGGTTTAGG 


TTACGATTAT 


TCAATTAATG 


ACTATTATAA 


7260 


GATAATTGCA GAAGAAATTG 


GTTATACTGG 


GAGTTTTTCT 


CATGATTTAA 


CAAAACCAAC 


7320 


AGGAATGAAA 


CGGAAGCTAG 


TAGATATTTC 


ATTGCTTAAT 


AAAATTGGTT 


GGTCAAGTCA 


7380 


CTTTGAACTC 


AGAGATGGCA 


TCAGAAAGAC 


CTATAATTAT 


TACTTGGAGA 


ATCAAAATAA 


7440 


ATGATTACAT 


ACCCACTTGC 


TAGTAATACT 


TGGGATGAAT 


ATGAGTATGC 


AGCAATACAG 


7500 


TCAGTAATTG 


ACTCAAAAAT 


GTTTACCATG 


GGTAAAAAGG 


TTGAGTTATA 


TGAGAAAAAT 


7560 


TTTGCTGATT 


TGTTTGGTAG 


CAAATATGCC 


GTAATGGTTA 


GCTCTGGTTC 


TACAGCTAAT 


7620 


CTGTTAATGA TTGCTGCCCT 


TTTCTTCACT 


AATAAACCAA 


AACTTAAAAG 


AGGTGATGAA 


7680 


ATAATAGTAC 


CTGCAGTGTC 


ATGGTCTACG 


ACATATTACC 


CTCTGCAACA 


GTATGGCTTA 


7740 


AAGGTGAAGT 


TTGTCGATAT 


CAATAAAGAA 


ACTTTAAATA 


TTGATATCGA 


TAGTTTGAAA 


7800 


AATGCTATTT 


CAGATAAAAC 


AAAAGCAATA 


TTGACAGTAA 


ATTTATTAGG 


TAATCCTAAT 


7860 


GATTTTGCAA AAATAAATGA 


GATAATAAAT 


AATAGGGATA 


TTATCTTACT 


AGAAGATAAC 


7920 


TGTGAGTCGA TGGGCGCGGT 


CTTTCAAAAT 


AAGCAGG CAG 


GCACATTCGG 


AGTTATGGGT 


7980 


ACCTTTAGTT 


CTTTTTACTC 


TCATCATATA 


GCTACAATGG 


AAGGGGGCTG 


CGTAGTTACT 


8040 


GATGATGAAG 


AGCTGTATCA 


TGTATTGTTG 


TGCCTTCGAG 


CTCATGGTTG 


GACAAGAAAT 


8100 


TTACCAAAAG 


AGAATATGGT 


TACAGGCACT 


AAGAGTGATG 


ATATTTTCGA 


AGAGTCGTTT 


8160 


AAGTTTGTTT 


TACCAGGATA 


CAATGTTCGC 


CCACTTGAAA 


TGAGTGGTGC 


TATTGGGATA 


8220 


GAGCAACTTA 


AAAAGTTACC 


AGGTTTTATA 


TCCACCAGAC 


GTTCCAATGC 


ACAATATTTT 


8280 


GTAGATAAAT 


TTAAAGATCA 


TCCATTCCTT 


GATATACAAA 


AAGAAGTTGG 


TGAAAGTAGC 


8340 
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TGGTTTGGTT TTTCCTTCGT TATAAAGGAG .GGAGCTGCTA TTGAGAGGAA GAGTTTAGTA 8400 

AATAATCTGA TCTCAGCAGG CATTGAATGC CGACCAATTG TTACTGGGAA TTTTCTCAAA 8460 

AATGAACGTG TTTTGAGTTA TTTTGATTAC TCTGTACATG ATACGGTAGC AAATGCCGAA 8520 

TATATAGATA AGAATGGTTT TTTTGTCGGA AACCACCAGA TACCTTTGTT TAATGAAATA 8580 

GATTATCTAC GAAAAGTATT AAAATAACTA ACGAGGCACT CTATTTCGAA TAGAGTGCCT 8640 

TTAAGATGGT ATTAACAGTG AAAAAAATTT TAGCGTTTGG CTATTCTAAA GTACTACCAC 8700 

CGGTTATTGA ACAGTTTGTC AATCCAATTT GCATCTTCAT TATCACACCA CTAATACTCA 8760 

ACCACCTGGG TAAGCAAAGC TATGGTAATT GGATTTTATT AATTACTATT GTATCTTTTT 8820 

CTCAGTTAAT ATGTGGAGGA TGTTCCGCAT GGATTGCAAA AATCATTGCA GAACAGAGAA 8880 

TTCTTAGTGA TTTATCAAAA AAAAATGCTT TACGTCAAAT TTCCTATAAT TTTTCAATTG 8940 

TTATTATCGC ATTTGCGGTA TTGATTTCTT TTCTTATATT AAGTATTTGT TTCTTCGATG 9000 

TTGCGAGGAA TAATTCTTCA TTCTTATTCG CGATTATTAT TTGTGGTTTT TTTCAGGAAG 9060 

TTGATAATTT ATTTAGTGGT GCGCTAAAAG GTTTTGAAAA ATTTAATGTA TCATGTTTTT 9120 

TTGAAGTAAT TACAAGAGTG CTCTGGGCTT CTATAGTAAT ATATGGCATT TACGGAAATG 9180 

CACTCTTATA TTTTACATGT TTAGCCTTTA CCATTAAAGG TATGCTAAAA TATATTCTTG 9240 

TATGTCTGAA TATTACCGGT TGTTTCATCA ATCCTAATTT TAATAGAGTT GGGATTGTTA 9300 

ATTTGTTAAA TGAGTCAAAA TGGATGTTTC TTCAATTAAC TGGTGGCGTC TCACTTAGTT 9360 

TGTTTGATAG GCTCGTAATA CCATTGATTT TATCTGTCAG TAAACTGGCT TCTTATGTCC 9420 
CTTGCCTTCA ACTAGCTCAA TTGATGTTCA CTCTTTCTGC GTCTGCAAAT CAAATATTAC 9480 
TACCAATGTT TGCTAGAATG AAAGCATCTA ACACATTTCC CTCTAATTGT TTTTTTAAAA 9540 
TTCTGCTTGT ATCACTAATT TCTGTTTTGC CTTGTCTTGC GTTATTCTTT TTTGGTCGTG 9600 
ATATATTATC AATATGGATA AACCCTACAT TTGCAACTGA AAATTATAAA TTAATGCAAA 9660 
TTTTAGCTAT AAGTTACATT TTATTGTCAA TGATGACATC TTTTCATTTC TTGTTATTAG 9720 
GAATTGGTAA ATCTAAGCTT GTTGCAAATT TAAATCTGGT TGCAGGGCTC GCACTTGCTG 9780 
CTTCAACGTT AATCGCAGCT CATTATGGCC TTTATGCAAT ATCTATGGTA AAAATAATAT 9840 
ATCCGGCTTT TCAATTTTAT TACCTTTATG TAGCTTTTGT CTATTTTAAT AGAGCGAAAA 9900 
ATGTCTATTG ATTTACTTTT TTCAATTACT GAAATCGCAA TTGTTTTTTC TTGCACTATT 9960 

TACATATTTA CTCAATGTTT GTTAATGCGG AGGATCTATT TAGATAAAAG TATTTTAATT 10020 

CTTTTATGCT TGCTCTTTTT TTTAGTAATC ATTCAACTTC CTGAGCTTAA TGTAAACGGT 10080 

TTGGTCGATT CTTTAAAGTT ATCACTGCCT TTATTGATGG TCTTTATCGC TTTTCAAAAA 10140 

CCGAAATTAT GCTTGTGGGT TATTATTGCA TTGTTGTTTT TGAACTCTGC ATTTAATTTT 10200 

TTATATTTAA AGACATTCGA TAAGTTTAGC TCATTTCCTT TTAC TTT TTT TATATTGCTG 10260 

TTTTACTTGT TTAGATTGGG AATTGGTAAT TTACCGGTTT ATAAAAATAA AAAATTTTAC 10320 

GCGTTGATTT TTCTCTTTAT ATTAATAGAC ATAATGCAGT CATTGTTAAT AAATTATAGG 10380 
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GGGCAGATTT TATATTCCGT AATTTGCATC CTGATACTTG TGTTTAAAGT TAATTTAAGA 10440 

AAAAAGATTC CATACTTTTT TTTAATGCTG CCAGTTTTAT ATGTAATTAT TATGGCTTAT 105O0 

ATTGGTTTTA ATTATTTCAA TAAAGGCGTA ACTTTTTTTG AACCTACAGC AAGTAATATT 10560 

GAACGTACGG GGATGATATA TTATTTGGTT TCACAGCTTG GTGATTATAT ATTCCATGGT 10620 

ATGGGGACAT TAAATTTCTT AAATAACGGC GGACAATATA AGACGTTATA TGGACTTCCA 10680 

TCATTAATTC CTAATGACCC TCATGATTTT TTATTACGGT TCTTTATAAG TATTGGTGTG 10740 

ATAGGAGCAT TGGTTTATCA TTCTATATTT TTTGTTTTTT TTAGGAGAAT ATCTTTCTTA 10800 

TTATATGAGA GAAATGCTCC TTTCATTGTT GTAAGTTGTT TGTTACTGTT ACAAGTTGTG 10860 

TTAATTTATA CATTAAACCC TTTTGATGCT TTTAATCGAT TGATTTGCGG GCTTACAGTT 10920 

GGAGTTGTTT ATGGATTTGC AAAAATTAGA TAAGTATACC TGTAATGGAA ATTTAGACGC 10980 

TCCACTTGTT TCAATAATCA TTGCAACTTA TAATTCTGAA CTTGATATAG CTAAGTGTTT 11040 

GCAATCGGTA ACTAATCAAT CTTATAAGAA TATTGAAATC ATAATAATGG ATGGAGGATC 11100 

TTCTGATAAA ACGCTTGATA TTGCAAAATC GTTTAAAGAC GACCGAATAA AAATAGTTTC 11160 

AGAGAAAGAT CGTGGAATTT ATGATGCCTG GAATAAAGCA GTTGATTTAT CCATTGGTGA 11220 

TTGGGTAGCA TTTATTGGTT CAGATGATGT TTACTATCAT ACAGATGCAA TTGCTTCATT 11280 

GATGAAGGGG GTTATGGTAT CTAATGGCGC CCCTGTGGTT TATGGGAGGA CAGCGCACGA 11340 
AGGTCCCGAT AGGAACATAT CTGGATTTTC AGGCAGTGAA TGGTACAACC TAACAGGATT 11400 
TAAGTTTAAT TATTACAAAT GTAATTTACC ATTGCCCATT ATGAGCGCAA TATATTCTCG 11460 
TGATTTCTTC AGAAACGAAC GTTTTGATAT TAAATTAAAA ATTGTTGCTG ACGCTGATTG 11520 
GTTTCTGAGA TGTTTCATCA AATGGAGTAA AGAGAAGTCA CCTTATTTTA TTAATGACAC 11580 
GACCCCTATT GTTAGAATGG GATATGGTGG GGTTTCGACT GATATTTCTT CTCAAGTTAA 11640 
AACTACGCTA GAAAGTTTCA TTGTACGCAA AAAGAATAAT ATATCCTGTT TAAACATACA 11700 
GCTGATTCTT AGATATGCTA AAATTCTGGT GATGGTAGCG ATCAAAAATA TTTTTGGCAA 11760 
TAATGTTTAT AAATTAATGC ATAACGGGTA TCATTCCCTA AAGAAAATCA AGAATAAAAT 11820 
ATGAAGATTG TTTATATAAT AACCGGGCTT ACTTGTGGTG GAGCCGAACA CCTTATGACG 11880 
CAGTTAGCAG ACCAAATGTT TATACGCGGG CATGATGTTA ATATTATTTG TCTAACTGGT 11940 
ATATCTGAGG TAAAGCCAAC ACAAAATATT AATATTCATT ATGTTAATAT GGATAAAAAT 12000 
TTTAGAAGCT TTTTTAGAGC TTTATTTCAA GTAAAAAAAA TAATTGTCGC CTTAAAGCCA 12060 
GATATAATAC ATAGTCATAT GTTTCATGCT AATATTTTTA GTCGTTTTAT TAGGATGCTG 12120 
ATTCCAGCGG TGCCCCTGAT ATGTACCGCA CACAACAAAA ATGAAGGTGG CAATGCAAGG 12180 
ATGTTTTGTT ATCGACTGAG TGATTTTTTA GCTTCTATTA CTACAAATGT AAGTAAAGAG 12240 
GCTGTTCAAG AGTTTATAGC AAGAAAGGCT ACACCTAAAA ATAAAATAGT AGAGATTCCG 12300 
AATTTTATTA ATACAAATAA ATTTGATTTT GATATTAATG TCAGAAAGAA AACGCGAGAT 12360 
GCTTTTAATT TGAAAGACAG TACAGCAGTA CTGCTCGCAG TAGGAAGACT TGTTGAAGCA 12420 
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AAAGACTATC 


CGAACTTATT 


AAATGCAATA 


AATCATTTGA 


TTCTTTCAAA 


AACATCAAAT 


12480 


TGTAATGATT 


TTATTTTGCT 


TATTGCTGGC 


GATGGCGCAT 


TAAGAAATAA 


ATTATTGGAT 


12540 


TTGGTTTGTC 


AATTGAATCT 


TGTGGATAAA 


GTTTTCTTCT 


TGGGGCAAAG 


AAGTGATATT 


12600 


AAAGAATTAA 


TGTGTGCTGC 


AGATCTTTTT 


GTTTTGAGTT 


CTGAGTGGGA AGGTTTTGGT 


12660 


CTCGTTGTTG 


CAGAAGCTAT 


GGCGTGTGAA 


CGTCCCGTTG 


TTGCTACCGA 


TTCTGGTGGA 


12720 


GTTAAAGAAG 


TCGTTGGACC 


TCATAATGAT 


GTTATCCCTG 


TCAGTAATCA 


TATTCTGTTG 


12780 


GCAGAGAAAA 


TCGCTGAGAC 


ACTTAAAATA 


GATGATAACG 


CAAGAAAAAT 


AATAGGTATG 


12840 


AAAAATAGAG 


AATATATTGT 


TTCCAATTTT 


TCAATTAAAA 


CGATAGTGAG 


TGAGTGGGAG 


12900 


CGCTTATATT 


TTAAATATTC 


CAAGCGTAAT 


AATATAATTG 


ATTGAAAATA 


TAAGTTTGTA 


12960 


CTCTGGATGC 


AATAGTTTCT 


CTATGCTGTT 


TTTTTACTGG 


CTCCGTATTT 


TTACTTATAG 




CTGGATTTTG 


TTATATATCA 


GTATTAATCT 


GTCTCAACTT 


CATCTAGACT 


ACATTCAAGC 


13080 


CGCGCATGCG 


TCGCGCGGTG 


ACTACACCTG 


ACAGGAGTAT 


GTAATGTCCA 


AGCAACAGAT 


* 13140 


CGGCGTCGTC 


GGTATGGCAG 


TGATGGGGCG 


CAACCTGGCG 


CTCAACATCG 


AAAGCCGCGG 


13200 


TTATACCGTC 


TCCATCTTCA 


ACCGCTCCCG 


CGAGAAAACT 


GAAGAAGTTG 


TTGCCGAGAA 


13260 


CCCGGATAAG 


AAACTGGTTC 


CTTATTACAC 


GGTGAAAGAG 


TTCGTCGAGT 


CTCTTGAAAC 


13320 


CCCACGTCGT 


ATCCTGTTAA 


TGGTAAAAGC 


AGGGGCGGGA 


ACTGATGCTG 


CTATCGATTC 


13380 



CCTGAAGCCG TATCTGGATA AAGGCGACAT CATTATTGAT GGTGGCAACA CCTTCTTCCA 13440 

GGACACTATC CGTCGTAACC GTGAACTGTC CGCGGAAGGC TTTAACTTCA TCGGTACCGG 13500 

CGTGTCCGGC GGTGAAGAGG GCGCCCTGAA AGGCCCATCT ATCATGCCAG GTGGCCAGAA 13560 

AGAAGCGTAT GAGCTGGTTG CGCCTATCCT GACCAAGATT GCTGCGGTTG CTGAAGATGG 13620 

CGAACCATGT ATAACTTACA TCGGTGCTGA CGGTGCGGGT CACTACGTGA AGATGGTGCA 13680 

CAACGGTATC GAATATGGCG ATATGCAGCT GATTGCTGAA GCCTATTCTC TGCTTAAAGG 13740 

CGGCCTTAAT CTGTCTAACG AAGAGCTGGC AACCACTTTT ACCGAGTGGA ATGAAGGCGA 13800 

GCTAAGTAGC TACCTGATTG ACATCACCAA AGACATCTTC ACCAAAAAAG ATGAAGAGGG 13860 

TAAATACCTG GTTGATGTGA TCCTGGACGA AGCTGCGAAC AAAGGCACCG GTAAATGGAC 13920 

CAGCCAGAGC TCTCTGGATC TGGGTGAACC GCTGTCGCTG ATCACCGAAT CCGTATTCGC 13 980 

TCGCTACATC TCTTCTCTGA AAGAC CAGCG CATTGCGGCA TCTAAAGTGC TGTCTGGTCC 14040 

GCAGGCTAAA CTGGCTGGTG ATAAAGCAGA GTTCGTTGAG AAAGTCCGTC GCGCGCTGTA 14100 

CCTGGGTAAA ATCGTCTCTT ATGCC GAAGG CTTCTCTCAA CTGCGTGCCG CGTCTGACGA 14160 
ATACAACTGG GATCTGAACT ACGGCGAAAT CGCGAAGATC TTCCGCGCGG GCTGCATCAT 14220 
TCGTGCGCAG TTCCTGCAGA AAATTACTGA CGCGTATGCT GAAAACAAAG GCATTGCTAA 14280 
CCTGTTGCTG GCTCCGTACT TCAAAAATAT CGCTGATGAA TATCAGCAAG CGCTGCGTGA 14340 
TGTAGTGGCT TATGCTGTGC AGAACGGTAT TCCGGTACCG ACCTTCTCTG CAGCGGTAGC 14400 
""ACTACGAC AGCTACCGTT CTGCGGTACT GCCGGCTAAT CTGATTCAGG CACAGCGTGA 14460 
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TTACTTCGGT GCGCACACGT ATAAACGCAC . TGATAAAGAA GGTGTGTTCC ACACCG 14516 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: X4 024 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: ij^n-ear 

(ii> MOIiECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: YES 
(v) ORIGINAL SOURCE 

(A) ORGANISM : Escherichia coli 
(vi) Note that the first i'9jap" is from the primer used for the long PCR 

(Xi) SEQUENCE DBSCRIPTIONi .SEJQ, ID NO:2: 

GTAACCAAGG GCGGTACGTG GATAAATTTT AATGCTTATC AAAACTATTA GCATTAAAAA 60 

TATATAAGAA ATTCTCAAAT GAACAAAGAA . ACCGTTTCAA TAATTATGCC CGTTTACAAT 120 

GGGGCCAAAA CTATAATCTC ATCAGTAGAA • TCAATTATAC ATCAATCTTA TCAAGATTTT 180 

GTTTTGTATA TCATTGACGA TTGTAGCACC GATGATACAT TTTCATTAAT CAACAGTCGA 240 

TACAAAAACA ATCAGAAAAT AAGAATATTG CGTAACAAGA CAAATTTAGG TGTTGCAGAA 300 

AGTCGAAATT ATGGAATAGA AATGQCCAGG = GGGAAATAT A TTTCTTTTTG TGATGCGGAT 360 

GATTTGTGGC ACGAGAAAAA ATTAGAGCGT CAAATCGAAG TGTTAAATAA TGAATGTGTA 420 

GATGTGGTAT GTTCTAATTA TTATGTTATA GATAACAATA GAAATATTGT TGGCGAAGTT 48 0 

AATGCTCCTC ATGTGATAAA TTATAGAAAA. ATGCTCATGA AAAACTACAT AGGGAATTTG 540 

ACAGGAATCT ATAATGCCAA CAAATTGGGT AAGTTTTATC AAAAAAAGAT TGGTCACGAG 600 

GATTATTTGA TGTGGCTGGA AATAATTAAT AAAACAAATG GTGCTATTTG TATTCAAGAT 660 

AATCTGGCGT ATTACATGCG TTCAAATAAT TCACTATCGG GTAATAAAAT TAAAGCTGCA 720 

AAATGGACAT GGAGTATATA TAGAGAACAT TTACATTTGT CCTTTCCAAA AACATTATAT 780 

TATTTTTTAT TATATGCTTC AAATGGAGTC ATGAAAAAAA TAACACATTC ACTATTAAGG 840 

AGAAAGGAGA CTAAAAAGTG AAGTCAGGGG CTAAGTTGAT TTTTTTATTC CTATTTACAC 900 

TTTATAGTCT CCAGTTGTAT GGGGTTATCA TAGATGATCG TATAACAAAT TTTGATACAA 960 

AGGTATTAAC TAGTATTATA ATTA^ATTTG AGATTTTTTT TGTTTTATTA TTTTATCTAA 1020 

CGATTATAAA TGAAAGAAAA CAGCAGAAAA AATTTATCGT GAACTGGGAG CTAAAGTTAA 1080 

TACTCGTTTT CCTTTTTGTG ACTATAGAAA TTGCTGCTGT AGTTTTATTT CTTAAAGAAG 1140 



WO 98/50S3I PCT7AU98/003 1 5 

- 70 - 

GTATTCCTAT ATTTGATGAT GATCCAGGGG. GGGCTAAACT TAGAATAGCT GAAGGTAATG 1200 

GACTTTACAT TAGATATATT AAGTATTTTG GTAATATAGT TGTGTTTGCA TTAATTATTC 1260 

TTTATGATGA GCATAAATTC AAACAGAGGA CCATCATATT TGTATATTTT ACAACGATTG 1320 

CTTTATTTGG TTATCGTTCT GAATTGGTGT TGCTCATTCT TCAATATATA TTGATTACCA 1380 

ATATCCTGTC AAAGGATAAC CGTAATCCTA AAATAAAAAG AATAATAGGG TATTTTTTAT 1440 

TGGTAGGGGT TGTATGCTCG TTGTTTTATC TAAGTTTAGG ACAAGACGGA GAACAAAATG 1500 

ACTCATATAA TAATATGTTA AGGATAATTA ATAGGTTAAC AATAGAGCAA GTTGAAGGTG 1560 

TTCCATATGT TGTTTCTGAA TCTATTAAGA ACGATTTCTT TCCGACACCA GAGTTAGAAA 1620 

AGGAATTAAA AGCAATAATA AATAGAATAC AGGGAATAAA GCATCAAGAC TTATTTTATG 1680 

GAGAACGGTT ACATAAACAA GTATTTGGAG ACATGGGAGC AAATTTTTTA TCAGTTACTA 1740 

CGTATGGAGC AGAACTGTTA GTTTTTTTTG GTTTTCTCTG TGTATTCATT ATCCCTTTAG 1800 

GGATATATAT ACCTTTTTAT CTTTTAAAGA GAATGAAAAA AACCCATAGC TCGATAAATT 1860 

GCGCATTCTA TTCATATATC ATTATGATTT TATTGCAATA CTTAGTGGCT GGGAATGCAT 1920 

CGGCCTTCTT TTTTGGTCCT TTTCTCTCCG TATTGATAAT GTGTACTCCT CTGATCTTAT 1980 

TGCATGATAC GTTAAAGAGA TTATCACGAA ATGAAAATAT CAGTTATAAC TGTGACTTAT 2040 

AATAATG CTG AAGGGTTAGA AAAAACTTTA AGTAGTTTAT CAATTTTAAA AATAAAACCT 2100 

TTTGAGATTA TTATAGTTGA TGGCGGCTCT ACAGATGGAA CGAATCGTGT CATTAGTAGA 2160 

TTTACTAGTA TGAATATTAC ACATGTTTAT GAAAAAGATG AAGGGATATA TGATGCGATG 2220 

AATAAGGGCC GAATGTTGGC CAAAGGCGAC TTAATACATT ATTTAAACGC CGGCGATAGC 2280 

GTAATTGGAG ATATATATAA AAATATCAAA GAGCCATGTT TGATTAAAGT TGGCCTTTTC 2340 

GAAAATGATA AACTTCTGGG ATTTTCTTCT ATAACCCATT CAAATACAGG GTATTGTCAT 2400 

CAAGGGGTGA TTTTCCCAAA GAATCATTCA GAATATGATC TAAGGTATAA AATATGTGCT 2460 

GATTATAAGC TTATTCAAGA GGTGTTTCCT GAAGGGTTAA GATCTCTATC TTTGATTACT 2520 

TCGGGTTATG TAAAATATGA TATGGGGGGA GTATCTTCAA AAAAAAGAAT TTTAAGAGAT 2580 
AAAGAGCTTG CCAAAATTAT GTTTGAAAAA AATAAAAAAA ACCTTATTAA GTTTATTCCA 2640 
ATTTCAATAA TCAAAATTTT ATTCCCTGAA CGTTTAAGAA GAGTATTGCG GAAAATGCAA 2700 
TATATTTGTC TAACTTTATT CTTCATGAAG AATAGTTCAC CATATGATAA TGAATAAAAT 2760 
CAAAAAAATA CTTAAATTTT GCACTTTAAA AAAATATGAT ACATCAAGTG CTTTAGGTAG 2820 
AGAACAGGAA AGGTACAGGA TTATATCCTT GTCTGTTATT TCAAGTTTGA TTAGTAAAAT 2880 
ACTCTCACTA CTTTCTCTTA TATTAACTGT AAGTTTAACT TTACCTTATT TAGGACAAGA 2940 
GAGATTTGGT GTATGGATGA CTATTACCAG TCTTGGTGCT GCTCTGACAT TTTTGGACTT 3000 
AGGTATAGGA AATGCATTAA CAAACAGGAT CGCACATTCA TTTGCGTGTG GCAAAAATTT 3060 
AAAGATGAGT CGGCAAATTA GTGGTGGGCT CACTTTGCTG GCTGGATTAT CGTTTGTCAT 3X20 
AACTGCAATA TGCTATATTA CTTCTGGCAT GATTGATTGG CAACTAGTAA TAAAAGGTAT 3180 
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AAACGAGAAT GTGUATG CAG AGTTACAACA -CTCAATTAAA GTCTTTGTAA TCATATTTGG 3240 

ACTTGGAATT TATTCAAATG GTGTGCAAAA AGTTTATATG GGAATACAAA AAGCCTATAT 3300 

AAGTAATATT GTTAATGCCA TATTTATATT GTTATCTATT ATTACTCTAG TAATATCGTC 336 0 

GAAACTACAT GCGGGACTAC CAGTTTTAAT TGTCAGCACT CTTGGTATTC AATACATATC 342 0 

GGGAATCTAT TTAACAATTA ATCTTATTAT AAAGCGATTA ATAAAGTTTA CAAAAGTTAA 3480 

CATACATGCT AAAAGAGAAG CTCCATATTT GATATTAAAC GGTTTTTTCT TTTTTATTTT 3540 

ACAGTTAGGC ACTCTGGCAA CATGGAGTGG TGATAACTTT ATAATATCTA TAACATTGGG 3600 

TGTTACTTAT GTTGCTGTTT TTAGCATTAC ACAGAGATTA TTTCAAATAT CTACGGTCCC 3660 

TCTTACGATT TATAACATCC CGTTATGGGC TGCTTATGCA GATGCTCATG CACGCAATGA 3720 

TACTCAATTT ATAAAAAAGA CGCTCAGAAC ATCATTGAAA ATAGTGGGTA TTTCATCATT 3780 

CTTATTGGCC TTCATATTAG TAGTGTTCGG TAGTGAAGTC GTTAATATTT GGACAGAAGG 3840 

AAAGATTCAG GT AC CTCGAA CATTCATAAT AGCTTATGCT TTATGGTCTG TTATTGATGC 3900 

TTTTTCGAAT ACATTTGCAA GCTTTTTAAA TGGTTTGAAC ATAGTTAAAC AACAAATGCT 3960 

TGCTGTTGTA ACATTGATAT TGATCGCAAT TCCAGCAAAA TACATCATAG TTAGCCATTT 4020 

TGGGTTAACT GTTATGTTGT ACTGCTTCAT TTTTATATAT ATTGTAAATT ACTTTATATG 4080 

GTATAAATGT AGTTTTAAAA AACA.TATCGA TAGACAGTTA AATATAAGAG GATGAAAATG 4140 

AAATATATAC CAGTTTACCA ACCGTCATTG ACAGGAAAAG AAAAAGAATA TGTAAATGAA 4200 

TGTCTGGACT CAACGTGGAT TTCATCAAAA GGAAACTATA TTCAGAAGTT TGAAAATAAA 4260 

TTTGCGGAAC AAAACCATGT GCAATATGCA ACTACTGTAA GTAATGGAAC GGTTGCTCTT 4320 

CATTTAGCTT TGTTAGCGTT AGGTATATCG GAAGGAGATG AAGTTATTGT TCCAACACTG 438 0 

ACATATATAG CATCAGTTAA TGCTATAAAA TACACAGGAG CCACCCCCAT TTTCGTTGAT 4440 

TCAGATAATG AAACTTGGCA AATGTCTGTT AGTGACATAG AACAAAAAAT CACTAATAAA 4500 

ACTAAAGCTA TTATGTGTGT CCATTTATAC GGACAT CCAT GTGATATGGA ACAAATTGTA 4560 

GAACTGGCCA AAAGTAGAAA TTTGTTTGTA ATTGAAGATT GCGCTGAAGC CTTTGGTTCT 462 0 

AAATATAAAG GTAAATATGT GGGAACATTT GGAGATATTT CTACTTTTAG CTTTTTTGGA 4680 

AATAAAACTA TTACTACAGG TGAAGGTGGA ATGGTTGTCA CGAATGACAA AACACTTTAT 4740 

GACCGTTGTT TACATTTTAA AGGCCAAGGA TTAGCTGTAC ATAGGCAATA TTGGCATGAC 4800 

GTTATAGGCT ACAATTATAG GATGACAAAT ATCTGCGCTG CTATAGGATT AGCCCAGTTA 4860 

GAACAAGCTG ATGATTTTAT ATCACGAAAA CGTGAAATTG CTGATATTTA TAAAAAAAAT 4920 

ATCAACAGTC TTGTACAAGT CCACAAGGAA AGTAAAGATG TTTTTCACAC TTATTGGATG 4980 

GTCTCAATTC TAACTAGGAC CGCAGAGGAA AGAGAGGAAT TAAGGAATCA CCTTGCAGAT 5040 

AAACTCATCG AAACAAGGCC AGTTTTTTAC CCTGTCCACA CGATGCCAAT GTACTCGGAA 5100 

AAATATCAAA AGCACCCTAT AGCTGAGGAT CTTGGTTGGC GTGGAATTAA TTTACCTAGT 5160 

TTCCCCAGCC TATCGAATGA GCAAGTTATT TATATTTGTG AATCTATTAA CGAATTTTAT 5220 
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AGTGATAAAT AGCCTAAAAT ATTGTAAAGG -TCATTCATGA AAATTGCGTT GAATTCAGAT 5280 

GGATTTTACG AGTGGGGCGG TGGAATTGAT TTTATTAAAT ATATTCTGTC AATATTAGAA 5340 

ACGAAACCAG AAATATGTAT CGATATTCTT TTACCGAGAA ATGATATACA TTCTCTTATA 5400 

AGAGAAAAAG CATTTCCTTT TAAAAGTATA TTAAAAGCAA TTTTAAAGAG GGAAAGGCCT 5460 

CGATGGATTT CATTAAATAG ATTTAATGAG CAATACTATA GAGATGCCTT TACACAAAAT 5520 

AATATAGAGA CGAATCTTAC CTTTATTAAA AGTAAGAGCT CTGCCTTTTA TTCATATTTT 5580 

GATAGTAGCG ATTGTGATGT TATTCTTCCT TGCATGCGTG TTCCTTCGGG AAATTTGAAT 5640 

AAAAAAGCAT GGATTGGTTA TATTTATGAC TTTCAACACT GTTACTATCC TTCATTTTTT 5700 

AGTAAGCGAG AAATAGATCA AAGGAATGTG TTTTTTAAAT TGATGCTCAA TTGCGCTAAC 5760 

AATATTATTG TTAATGCACA TTCAGTTATT ACCGATGCAA ATAAATATGT TGGGAATTAT 5820 

TCTGCAAAAC TACATTCTCT TCCATTTAGT CCRTGCCCTC AATTAAAATG GTTCGCTGAT 5880 

TACTCTGGTA ATATTGCCAA ATATAATATT GACAAGGATT ATTTTATAAT TTGCAATCAA 5940 

TTTTGGAAAC ATAAAGATCA TGCAACTGCT TTTAGGGCAT TTAAAATTTA TACTGAATAT 6000 

AATCCTGATG TTTATTTAGT ATGCACGGGA GCTACTCAAG ATTATCGATT CCCTGGATAT 6060 

TTTAATGAAT TGATGGTTTT GGCAAAAAAG CTCGGAATTG AATCGAAAAT TAAGATATTA 6120 

GGGCATATAC CTAAACTTGA ACAAATTGAA TTAATCAAAA ATTGCATTGC TGTAATACAA 6180 

CCAACCTTAT TTGAAGGCGG GCCTGGAGGG GGGGTAACAT TTGACGCTAT TGCATTAGGG 6240 

AAAAAAGTTA TACTATCTGA CATAGATGTC AATAAAGAAG TTAATTGCGG TGATGTATAT 6300 

TTCTTTCAGG CAAAAAACCA TTATTCATTA AATGACGCGA TGGTAAAAGC TGATGAATCT 6360 

AAAATTTTTT ATGAACCTAC AACTCTGATA GAATTGGGTC TCAAAAGACG CAATGCGTGT 6420 

GCAGATTTTC TTTTAGATGT TGTGAAACAA GAAATTGAAT CCCGATCTTA ATATATTCAA 6480 

GAGGTATATA ATGACTAAAG TCGCTCTTAT TACAGGTGTA ACTGGACAAG ATGGATCTTA 6540 

TCTAGCTGAG TTTTTGCTTG ATAAAGGGTA TGAAGTTCAT GGTATCAAAC GCCGAGCCTC 6600 

ATCTTTTAAT ACAGAACGCA TAGACCATAT TTATCAAGAT CCACATGGTT CTAACCCAAA 6660 

TTTTCACTTG CACTATGGAG ATCTGACTGA TTCATCTAAC CTCACTAGAA TTCTAAAGGA 6720 

GGTACAGCCA GATGAAGTAT ATAATTTAGC TGCTATGAGT CACGTAGCAG TTTCTTT T GA 6780 

GTCTCCAGAA TATACAGCCG ATGTCGATGC AATTGGTACA TTACGTTTAC TGGAAGCAAT 6840 

TCGCTTTTTA GGATTGGAAA ACAAAACGCG TTTCTATCAA GCTTCAACCT CAGAATTATA 6900 

TGGACTTGTT CAGGAAATCC CTCAAAAAGA ATCGACCCCT TTTTATCCTC GTTCCCCTTA 6960 

TGCAGTTGCA AAACTTTACG GATATTGGAT CACGGTAAAT TATCGAGAGT CATATGGTAT 7020 

TTATGCATGT AATGGTATAT TGTTCAATCA TGAATCTCCA CGCCGTGGAG AAACGTTTGT 7080 

AACAAGGAAA ATTACTCGAG GACTTGCAAA TATTGCACAA GGCTTGGAAT CATGTTTGTA 7140 

TTTAGGGAAT ATGGATTCGT TACGAGATTG GGGACATGCA AAAGATTATG TTAGAATGCA 7200 

ATGGTTGATG TTACAACAGG AGCAACCCGA AGATTTTGTG ATTGCAACAG GAGTCCAATA 7260 
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CTCAGT.CCGT 


CAGTTTGTCG 


AAATGGCAGC. 


. AGCACAACTT 


GGTATTAAGA 


TGAGCTTTGT 


7320 


TGGTAAAGGA 


ATCGAAGAAA 


AAGGCATTGT 


AGATTCGGTT 


GAAGGACAGG 


ATGCTCCAGG 


7380 


TGTGAAACCA 


GGTGATGTCA 


TTGTTGCTGT 


TGATC CTCGT 


TATTTCCGAC 


CAGCTGAAGT 


7440 


TGATACTTTG 


CTTGGAGATC 


CGAGCAAAGC 


TAATCTCAAA 


CTTGGTTGGA 


GACCAGAAAT 


7500 


TACTCTTGCT 


GAAATGATTT 


CTGAAATGGT 


TGCCAAAGAT 


CTTGAAGCCG 


CTAAAAAACA 


7560 


TTCTCTTTTA 


AAATCGCATG 


GTTTTTCTGT 


AAGCTTAGCT 


CTGGAATGAT 


GATGAATAAG 


7620 


CAACGTATTT 


TTATTGCTGG 


TCACCAAGGA 


ATGGTTGGAT 


CAGCTATTAC 


CCGACGCCTC 


7680 


AAACAACGTG 


ATGATGTTGA 


GTTGGTTTTA 


CGTACTCGGG 


ATGAATTGAA 


CTTGTTGGAT 


7740 


AGTAGCGCTG 


TTTTGGATTT 


TTTTTCTTCA 


CAGAAAATCG 


ACCAGGTTTA 


TTTGGCAGCA 


7800 


GCAAAAGTCG 


GAGGTATTTT 


AGCTAACAGT 


TCTTATCCTG 


CCGATTTTAT 


ATATGAGAAT 


7860 


ATAATGATAG 


AGGCGAATGT 


CATTCATGCT 


GCCCACAAAA 


ATAATGTAAA 


TAAACTGCTT 


7920 


TTCCTCGGTT 


CGTCGTGTAT 


TTATCCTAAG 


TTAGCACACC 


AACCGATTAT 


GGAAGACGAA 


7980 


TTATTACAAG 


GGAAACTTGA 


GCCAACAAAT 


GAACCTTATG 


CTATCGCAAA 


AATTGCAGGT 


8040 


ATTAAATTAT 


GTGAATCTTA 


TAACCGTCAG 


TTTGGGCGTG 


ATTACCGTTC 


AGTAATGCCA 


8100 


ACCAATCTTT 


ATGGTCCAAA 


TGACAATTTT 


CATCCAAGTA 


ATTCTCATGT 


GATTCCGGCG 


8160 


CTTTTGCGCC 


GCTTTCATGA 


TGCTGTGGAA 


AACAATTCTC 


CGAATGTTGT 


TGTTTGGGGA 


8220 


AGTGGTACTC 


CAAAGCGTGA 


ATTCTTACAT 


GTAGATGATA 


TGGCTTCTGC 


AAGCATTTAT 


8280 


GTCATGGAGA 


TGCCATACGA 


TATATGGCAA 


AAAAATACTA 


AAGTAATGTT 


GTCTCATATC 


8340 


AATATTGGAA 


CAGGTATTGA 


CTGCACGATT 


TGTGAGCTTG 


CGGAAACAAT 


AGCAAAAGTT 


8400 


GTAGGTTATA 


AAGGGCATAT 


TACGTTCGAT 


ACAACAAAGC 


CCGATGGAGC 


CCCTCGAAAA 


8460 


CTACTTGATG 


TAACGCTTCT 


TCATCAACTA 


GGTTGGAATC 


ATAAAATTAC 


CCTTCACAAG 


8520 


GGTCTTGAAA 


ATACATACAA 


CTGGTTTCTT 


GAAAACCAAC 


TTCAATATCG 


GGGGTAATAA 


8580 


TGTTTTTACA 


TTCCCAAGAC 


TTTGCCACAA 


TTGTAAGGTC 


TACTCCTCTT 


ATTTCTATAG 


8640 


ATTTGATTGT 


GGAAAACGAG 


TTTGGCGAAA 


TTTTG CTAGG 


AAAACGAATC 


AACCGCCCGG 


8700 


CACAGGGCTA 


TTGGTTCGTT 


CCTGGTGGTA 


GGGTGTTGAA 


AGATGAAAAA 


TTGCAGACAG 


8760 


CCTTTGAACG 


ATTGACAGAA 


ATTGAACTAG 


GAATT CGTTT 


GCCTCTCTCT 


GTGGGTAAGT 


8820 


TTTATGGTAT 


CTGGCAGCAC 


TTCTACGAAG 


ACAATAGTAT 


GGGGGGAGAC 


TTTTCAACGC 


8880 


ATTATATAGT 


TATAGCATTC 


CTTCTTAAAT 


TACAACCAAA 


CATTTTGAAA 


TTACCGAAGT 


8940 


CACAACATAA 


TGCTTATTGC 


TGGCTATCGC 


GAGCAAAGCT 


GATAAATGAT 


GACGATGTGC 


9000 


ATTATAATTG 


TCGCGCATAT 


TTTAACAATA 


AAACAAATGA 


TGCGATTGGC 


TTAGATAATA 


9060 


AGGATATAAT 


ATGTCTGATG 


CGCCAATAAT 


TGCTGTAGTT 


ATGGCCGGTG 


GTACAGGCAG 


9120 


TCGTCTTTGG 


CCACTTTCTC 


GTGAACTATA 


TCCAAAGCAG 


TTTTTACAAC 


TCTCTGGTGA 


9180 


TAACACCTTG 


TTACAAACGA 


CTTTGCTACG 


ACTTTCAGGC 


CTATCATGTC 


AAAAACCATT 


3240 


AGTGATAACA 


AATGAACAGC 


ATCGCTTTGT 


TGTGGCTGAA 


CAGTTAAGGG 


AAATAAATAA 


9300 
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ATTAAATGGT 


AATATTATTC TAGAACCATG. 


CGGGCGAAAT 


ACTGCACCAG 


CAATAGCGAT 


9360 


ATCTGCGTTT 


CATGCGTTAA AACGTAATCC 


TCAGGAAGAT 


CCATTGCTTC 


TAGTTCTTGC 


9420 


GGCAGACCAC 


GTTATAGCTA AAGAAAGTGT 


TTTCTGTGAT 


GCTATTAAAA 


ATGCAACTCC 


9480 


CATCGCTAAT 


CAAGGTAAAA TTGTAACGTT 


TGGAATTATA 


CCAGAATATG 


CTGAAACTGG 


9540 


TTATGGGTAT 


ATTGAGAGAG GTGAACTATC 


TGTACCGCTT 


CAAGGGCATG 


AAAATACTGG 


9600 


TTTTTATTAT 


GTAAATAAGT TTGTCGAAAA 


GCCTAATCGT 


GAAACCGCAG 


AATTGTATAT 


9660 


GACTTCTGGT 


AATCACTATT GGAATAGTGG 


AATATTCATG 


TTTAAGGCAT 


CTGTTTATCT 


9720 


TGAGGAATTG 


AGAAAATTTA GACCTGACAT 


TTACAATGTT 


TGTGAACAGG 


TTGCCTCATC 


9780 


CTCATACATT 


GATCTAGATT TTATTCGATT 


ATCAAAAGAA 


CAATTTCAAG 


ATTGTCCTGC 


9840 


TGAATCTATT 


GATTTTGCTG TAATGGAAAA 


AACAGAAAAA 


TGTGTTGTAT 


GCCCTGTTGA 


9900 


TATTGGTTGG 


AGTGACGTTG GATCTTGGCA 


ATCGTTATGG 


GACATTAGTC 


TAAAATCGAA 


9960 


AACAGGAGAT 


GTATGTAAAG GTGATATATT 


AACCTATGAT 


ACTAAGAATA 


ATTATATCTA 


10020 


CTCTGAGTCA 


GCGTTGGTAG CCGCCATTGG 


AATTGAAGAT 


ATGGTTATCG 


TGCAAACTAA 


10080 


AGATGCCGTT 


CTTGTGTCTA AAAAGAGTGA TGTACAGCAT GTAAAAAAAA TAGTCGAAAT 


10140 


GCTTAAATTG 


CAGCAACGTA CAGAGTATAT 


TAGTCATCGT 


GAAGTTTTCC 


GACCATGGGG 


10200 


AAAATTTGAT 


TCGATTGACC AAGGTGAGCG 


ATACAAAGTC 


AAGAAAATTA 


TTGTGAAACC 


10260 


TGGTGAGGGG 


CTTTCTTTAA GGATGCATCA 


CCATCGTTCT 


GAACATTGGA 


TCGTGCTTTC 


10320 


TGGTACAGCA 


AAAGTAACCC TTGGCGATAA 


AACTAAACTA 


GTCACCGCAA 


ATGAATCGAT 


10380 


ATACATTCCC 


CTTGGCGCAG CGTATAGTCT 


TGAGAATCCG 


GGCATAATCC 


CTCTTAATCT 


10440 


TATTGAAGTC 


AGTTCAGGGG ATTATTTGGG 


AGAGGATGAT 


ATTATAAGAC 


AGAAAGAACG 


10500 


TTACAAACAT 


GAAGATTAAC ATATGAAATC 


TTTAACCTGC 


TTTAAAGCCT 


ATGATATTCG 


10560 


CGGGAAATTA 


GGCGAAGAAC TGAATGAAGA 


TATTGCCTGG 


CGCATTGGGC 


GTGCCTATGG 


10620 


CGAATTTCTC AAACCGAAAA CCATTGTTTT 


AGGCGGTGAT 


GTCCGCCTCA 


CCAGCGAAGC 


10680 


GTTAAAACTG 


GCGCTTGCGA AAGGTTTACA 


GGATGCGGGC 


GTCGATGTGC 


TGGATATCGG 


10740 


TATGTCCGGC 


ACCGAAGAGA TCTATTTCGC 


CACGTTCCAT 


CTCGGAGTGG 


ATGGCGGCAT 


10800 


CGAAGTTACC 


GCCAGCCATA ACCCGATGGA 


TTACAACGGC 


ATGAAGCTGG 


TGCGCGAAGG 


10860 


GGCTCGCCCG 


ATCAGCGGTG ATACCGGACT 


GCGCGATGTC 


CAGCGTCTGG 


CAGAAGCCAA 


10920 


TGACTTCCCT 


CCTGTCGATG AAACCAAACG 


TGGTCGCTAT 


CAGCAAATCA 


ATCTGCGTGA 


10980 


CGCTTACGTT 


GATCACCTGT TCGGTTATAT 


CAACGTCAAA AACCTCACGC 


CGCTCAAGCT 


11O40 


GGTGATCAAC 


TCCGGGAACG GCGCAGCGGG 


TCCGGTGGTG 


GACGCCATTG 


AAGCCCGATT 


11100 


TAAAGCCCTC 


GGCGCACCGG TGGAATTAAT 


CAAAGTACAC 


AACACGCCGG 


ACGGCAATTT 


11160 


CCCCAACGGT 


ATTCCTAACC CGCTGCTGCC 


GGAATGCCGC 


GACGACACCC 


GTAATGCGGT 


11220 


CAT CAAACAC 


GGCGCGGATA TGGGCATTGC 


CTTTGATGGC 


GATTTTGACC 


GCTGTTTCCT 


11280 


GTTTGACGAA 


AAAGGGCAGT TTATCGAGGG 


CTACTACATT 


' GTCGGCCTG C 


' TGGCAGAAGC 


11340 
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GTTCCTCGAA AAAAATCCCG GCGCGAAGAT . CATCCACGAT CCACGTCTCT CCTGGAACAC 11400 

CGTTGATGTG GTGACTGCCG CAGGCGGCAC CCCGGTAATG TCGAAAACCG GACACGCCTT 11460 

TATTAAAGAA CGTATGCGCA AGGAAGACGC CATCTACGGT GGCGAAATGA GCGCTCACCA 11520 

TTACTTCCGT GATTTCGCTT ACTGCGACAG CGGCATGATC CCGTGGCTGC TGGTCGCCGA 11580 

ACTGGTGTGC CTGAAAGGAA AAACGCTGGG CGAAATGGTG CGCGACCGGA TGGCGGCGTT 11640 

TCCGGCAAGC GGTGAGATCA ACAGCAAACT GGCGCAACCC GTTGAGGCAA TTAATCGCGT 11700 

GGAACAGCAT TTTAGCCGCG AGGCGCTGGC GGTGGATCGC ACCGATGGCA TCAGCATGAC 1176 0 

CTTTGCCGAC TGGCGCTTTA ACCTGCGCTC CTCCAACACC GAACCGGTGG TGCGGTTGAA 11820 

TGTGGAATCA CGCGGTGATG TAAAGCTAAT GGAAAAGAAA ACTAAAGCTC TTCTTAAATT 11880 

GCTAAGTGAG TGATTATTTA CATTAATCAT TAAGCGTATT TAAGATTATA TTAAAGTAAT 11940 

GTTATTGCGG TATATGATGA ATATGTGGGC TTTTTTATGT ATAACGACTA TACCGCAACT 12000 

TTATCTAGGA AAAGATTAAT AGAAATAAAG TTTTGTACTG ACCAATTTGC ATTTCACGTC 12060 

ACGATTGAGA CGTTCCTTTG CTTAAGACAT TTTTTCATCG CTTATGTAAT AACAAATGTG 12120 

CCTTATATAA AAAGGAGAAC AAAATGGAAC TTAAAATAAT TGAGACAATA GATTTTTATT 12180 

ATCCCTGTTT ACGATATTAT AGCCAAAGTT GTATCCTGCA TCAGTCCTGC AATATTTCAC 12240 

GAGTGCTTTG TTAACTGAAT ACATGTCTGC CATTTTCCAG ATGATAACGA CGTCATCGCA 12300 

ATTGATGGTA AAACACTTCG GCACACTTAT GACAAGAGTC GTCGCAGAGG AGTGGTTCAT 12360 

GTCATTAGTG CGTTTCAGCA ATGCACAGTC TGGTCCTCGG ATAGATCAAG ACGGATGAGA 12420 

AACCTAATGC GTTCACAGTT ATTCATGAAC TTTCTAAAAT GATGGGTATT AAAGGAAAAA 12480 

TAATCATAAC TGATGCGATG GCTTGCCAGA AAGATATTGC AGAGAAGATA TAAAAACAGA 12540 

GATGTGATTA TTTATTCGCT GTAAAAGGAA ATAAGAGTCG GCTTAATAGA GTCTTTGAGG 12600 

AGATATTTAC GCTGAAAGAA TTAAATAATC CAAAACATGA CAGTTACGCA ATTAGTGAAA 12660 

AGAGGCACGG CAGAGACGAT GTCCGTCTTC ATATTGTTTG AGATGCTCCT GATGAGCTTA 12720 

TTGATTTCAC GTTTGAATGG AAAGGGCTGC AGAATTTATG AATGGCAGTC CACTTTCTCT 12780 

CAATAATAGC AGAGCAAAAG AAAGAATCCG AAATGACGAT CAAATATTAT ATTAGATCTG 12840 

CTGCTTTAAC CGCAGAGAAG TTCGCCACAG TAAATCGAAA TCACTGGCGC ATGGAGAATA 12900 

AGTTGCACAG TAGCCTGATG TGGTAATGAA TGAAATCGAC TATAATATAA GAAGGCGAGT 12960 

TGCATTCGAA TGATTTTCTA GAATGCGGCA CATCGCTATT AATATCTGAC AATGATAATG 13020 

TATTCAAGGC AGGATTATCA TGTAAGATGC GAAAAGCAGT CATGGACAGA AACTTCCTAG 13080 

CGTCAGGCAT TGCAGCGTGC GGGCTTTCAT AATCTT GCAT TGGTTTTGAT AAGATATTTC 13140 

TTTGGAGATG GGAAAATGAA TTTGTATGGT ATTTTTGGTG CTGGAAGTTA TGGTAGAGAA 13200 

ACAATACCCA TTCTAAATCA ACAAATAAAG CAAGAATGTG GTTCTGACTA TGCTCTGGTT 13260 

TTTGTGGATG ATGTTTTGGC AGGAAAGAAA GTTAATGGTT TTGAAGTGCT TTCAACCAAC 13320 

TGCTTTCTAA AAGCCCCTTA TTTAAAAAAG TATTTTAATG TTGCTATTGC TAATGATAAG 13380 



WO 98/50531 



PO7AU98/00315 



- 16 - 

ATACGACAGA GAGTGTCTGA GTCAATATTA -TTACACGGGG TTGAACCAAT AACTATAAAA 13440 

CATCCAAATA GCGTTGTTTA TGATCATACT ATGATAGGTA GTGGCGCTAT TATTTCTCCC 13 500 

TTTGTTACAA TATCTACTAA TACTCATATA GGGAGGTTTT TTCATGCAAA CATATACTCA 13560 

TACGTTGCAC ATGATTGTCA AATAGGAGAC TATGTTACAT TTGCTCCTGG GGCTAAATGT 13620 

AATGGATATG TTGTTATTGA AGACAATGCA TATATAGGCT CGGGTGCAGT AATTAAGCAG 13680 

GGTGTTCCTA ATCGCCCACT TATTATTGGC GCGGGAGCCA TTATAGGTAT GGGGGCTGTT 13740 

GTCACTAAAA GTGTTCCTGC CGGTATAACT GTGTGCGGAA ATCCAGCAAG AGAAATGAAA 13800 

AGATCGCCAA CATCTATTTA ATGGGAATGC GAAAACACGT TCCAAATGGG ACTAATGTTT 13860 

AAAATATATA TAATTTCGCT AATTTACTAA ATTATGGCTT CTTTTTAAGC TATCCTTTAC 13920 

TTAGTTATTA CTGATACAGC ATGAAATTTA TAATACTCTG ATACATTTTT ATACGTTATT 13980 

CAAGCCGCAT ATCTAGCGGT AACCCCTGAC AGGAGTAAAC AATG 14024 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12441 base pairs 

(B) TYPE: nucleic acid 

(C) STRAKDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iv) ANTI- SENSE: YES 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Salmonella enterica serovar muenchen serogroup C2 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 



GTTGACAAAT ACCGACCGTA TAATGAATCA AACGTTCTGG 


ATTGGTATTT 


ATCCAGGCTT 


60 


GACTACAGAG CATTTAGATT ATGTCGTAAG TAAGTTTGAA 


GAATTTTTTG 


GTTTAAATTT 


120 


CTAATTTTTA GGATAGGATG CTTGATGTGA ATAAGAAAAT 


CCTAATGACT 


GGCGCTACTA 


180 


GCTTTGTAGG TACCCATCTA CTACATAGTC TCATAAAGGA 


AGGTTATAGT 


ATTATTGCAT 


240 


TAAAGCGTCC TATAACCGAG CCAACGATTA TCAATACCTT 


GATTGAATGG 


TTGAATATAC 


300 


AAGATATAGA AAAAATATGT CAATCATCTA TGAATATTCA 


TGCGATTGTC 


CATATTGCAA 


360 


CAGACTATGG TCGAAACAGA ACCCCTATAT CTGAACAATA 


TAAATGTAAT 


GTCCTATTAC 


420 


CAACAAGACT GCTTGAGTTA ATGCCAGCGC TTAAAACGAA 


ATTCTTTATT 


TCTACTGACT 


480 


CTTTTTTTGG GAAATATGAG AAGCACTATG GATATATGCG 


TTCTTACATG 


GCATCTAAAA 


540 


GACATTTTGT AGAACTATCA AAAATATACG TAGAGGAACA 


TCCAGACGTT 


TGTTTTATAA 


600 


ATTTACGTTT AGAACATGTT TACGGTGAGA GGGATAAAGC 


AGGTAAAATA 


ATCCCGTATG 


660 


TTATCAAAAA AATGAAAAAC AATGAAGATA TTGATTGTAC 


GATCGCCAGG 


CAGAAAAGAG 


720 


ATTTTATTTA TATAGACGAT GTTGTTTCGG CCTATTTGAA 


AATTTTAAAG 


GAGGGTTTTA 


780 
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ACGCTGGACA CTAT-GATGTC GAGGTGGGGA -CTGGAAAATC GATAGAGCTA AAAGAAGTGT 840 

TTGAGATAAT AAAAAAAGAA ACGCATAGTA GTAGTAAGAT AAATTATGGT GCAGTTGCGA 900 
TGCGTGATGA TGAGATTATG GAGTCACATG CAAATACCTC TTTCTTGACT CGATTAGGTT 960 

GGAGTGCCGA GTTTTCTATT GAGAAGGGTG TGAAAAAAAT GTTGAGTATG AAAGAGTAAT 1020 

GAATCGTATT ATTAGAATGT TAGGTGTAGA TAAAGCAATT CGTTATGTTA TTTTTGGTAA 1080 

GATAATATCT GTATTAACGG GTTTACTGTT AATAATGTTA ATATCACACC ATTTATCTAA 1140 

AGACGCACAG GGCTATTATT ATACATTTAA TTCAGTAGTG GCACTACAGA TAATATTTGA 1200 

ATTGGGGCTA TCAACGGTAA TCATTCAATT CGCTAGCCAT GAAATGTCAG CGTTAAAATA 1260 

TGATTATTCT GAACGAGATA TTATAGGTGA AAGTAAAAAT AAGCAACGTT ACCTATCGTT 132 0 

ATTTCGGTTG GCAATAAAAT GGTATGCAGT AATAGCTTTG CTAATAATAT TAATAGTCGG 138 0 

TCCCATCGGG TATGTTTTTT TTACGCAAAA AGAAGGCTTA GGTGTACCTT GGCAAGGGGC 1440 

ATGGTTATTA TTAACAATAG TTACAGCTTT TAATATTTTT CTTGTTTCTG TACTTTCTGT 1500 

CGCTGAAGGG AGTGGGTTAA TTACTGATGT GAATAAAATG AGAATGTATC AGTCGCTGTT 1560 

AGCTGGTATA TTGGCAGTAA GCTTACTTAT TAGTGGCTTT GGACTATATG CTACGTCTGC 1620 

AATAGCTATT TCAGGGACTA TCATATTCTC CATATTTTCA TATAAGTATT TTAAAAAAAT 1680 

. TTTCCTGCAA TCTTTAAAGC ATAAAAATAA ATATACTGAA GGTGGTATTT CATGGGTTAA 1740 

TGAAATATTT CCTATGCAAT GGCGAATTGC TCTAAGTTGG ATGTCAGGGT ATTTTATTTA 1800 

TTTTGTTATG ACCCCCATTG CATTCAAATA TTTCGGGGCT ATATATGCAG GGCAGTTAGG 1860 

GATGTCTTTA ACATTATGCA ATATGGTAAT GGCTACGGGC CTGGCTTGGA TATCCACTAA 1920 

ATATCCAAAA TGGGGAGTAA TGGTTTCCAA CAAACAGCTT GCGGAACTGA GTAAATCGTT 1980 

CAAAAGTGCA GTAATGCAAT CATCCTTTTT TGTCTTGACA GGATTAACTG GTGTATACAT 2040 

TTCATTATGG TTATTGAAAT TATCTGGTTC AAACATTGGC GAGCGGTTTT TGGGATTGCA 2100 

GGATTTTTTC TTTTTATCTT TAGCAATTAT TGGTAATCAC ATTGTAGCTT GCTTTGCAAC 2160 

CTATATAAGA GCGCATAAAA CTGAAAAAAT GACATTGGCA TCATGTATAA TGGCTCTCTT 2220 

GACTATAACT ACAATGTTGT TTGTTGCATA TTTAGAGTAC TCGAGGTTCT ACATGTTAAT 2280 

GTATGCAGCA CTAACGTGGT TATATTTTGT TCCTCAAACT TATATAATCT TTAAAAGATT 2340 
CAAGAGTTCT TATGAGTAAA AAACCTCTTC TTACTATTGC TATTCCGACA TATAACCGCT 2400 
CTTCATGTTT GGCTCGTTTA CTTGATAGTA TAATTCAACA GGAGAACTAT TGTCATGATG 246 0 
AACTCGAGGT TATTGTTTGT GATAATGCTT CAACAGATGA AACAGCAAGA ATAGCCAAGA 252 0 
GTGGCTTAGA TAAAATAAGA AATAGTACTT ATCATCTAAA TGAAGAAAAC TTAGGAATGG 2580 
ATGGTAACTT CCAGAAATGT TTTGAGTTAT CAAATGGAAA ATATCTTTGG ATGATTGGCG 264 0 
ATGATGATCT AATAGTCAAA AATGGTATTT CGAAGGTTTT TTCGATATTA AAGTCCCGGC 270 0 
CTGCATTAGA TATGGTGTAT GTAAATTCAG CAGCAAAGAC TGAGTTAAAC TATAATGCTG 2760 
ATGTGAGGAC GTCATTCTAC ACAAATGATG TAGATTTTAT TTCAGACGTG AAAGTTATGT 2820 
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TCACGTTTAT TTCTGGAATG ATATGTAAGA AAACTGATGC AATTGTCAAA GCCGTTGGTA 2880 

TTTTCAGTCC GCAAACTACT GGAAAATATC TTATGCATTT AACATGGCAA TTGCCATTAC 2940 

TTAAACAGGG TGGAGAGTTC GCAGTTATCC ATAATAATAT AATTGAGGCT GAGCCAGATA 3000 

ATTCAGGTGG ATATCATTTA TATAAGGTTT TTTCTAATAA TCTTGCGACA ATCTTTGATG 3060 

TTTTTTATCC CAGAGAGCAC CGTGTAAGTA AAAGAGTTCG CGCATCAGCA TGTTTATTCT 3120 

TACTTAACTT CATAGGCGAT GAAGATAAAA CCAAAAATTT TGCTACAAAT AATTATTTAA 3180 

GAGATTGCGA TAGTGCATTT ATAGATTTAA TTATATATAA ATATGGGCTT AGGTTTTTCT 3240 

ATCTATATCC TAAAACTGTG CCTTTATTTA GAAAAATAAA ATATATTATA AAGACGGTTT 3300 

TAATGCGGAA ATAAAAATTA TTCAAGATGG TTTGCTGAAA ACGACTTATA GGACTATCTA 3360 

ATGTTTGTCT ATAGTTTAAG ATTAAAATTA AATCTTATCA TATCATTATT GAGTAAAGTT 3420 

AG GCGGAAAT CAAAAGCAAA GTTTCTTGTT CTGCTTAGCG GATATGATTT TAAAATGGTT 3480 

GGGAAGAATT TTAAATTGAA TGTCAAACCT TACTCTGCAA AAAATAACAC CTCTTCCAAA 3540 

TGGGGTAGTA TGCGGGTTGG TGATAACTGC TGGATTGAAG CTGTATATAA TTATGGTGAT 3600 

GAAAAATTTG AAC C TTATTT GTACATAGGT GATCGTATAT GTTTAAGTGA TAATGTTCAT 3660 

ATTTCTTGCG TATCATGTTT AATTTTAGAA AACGATATAT TAATTGGTAG CAAAGTTTAT 3720 

ATAGGCGATC ATAGCCATGG CAGTTATAAA GTATGCAGTC CGAAAATAGA ACCGCCAGCA 3780 

AATAAGCCAT TAGGTGATAT TGCTCCTATT AAAATAGGTA ATTGCTGCTG GATTGGAGAT 3840 

AATGCAGTAA TTCTGGCTGG TAGTGAAATT TGTGATGGCT GTGTAATCGC AGCTAATTCA 3 900 

GTCGTCAAGG ATTTAAAAGT CGATAAGCCA TGTTTAATTG GTGGGGTTCC TGCTAAAGTA 3 96 0 

ATAAAGGTAT TTTAAAATGA ATGTTTTTAT CAGTATTTGT ATACCGTCTT ATAATAGAGC 4020 

TGAGTTTTTA GAGCCACTAC TGGATAGCAT ATATAATCAA GATTATTGTT TAAAGAATAA 4080 

TGATTTTGAG GTCATTGTTT GTGAAGATAA ATCTCCACAG AGAGATGAGA TAAACTCTAT 4140 

TATCGAAAAC TATAAAGCAA AAAATAATAA ACAAAATCTT TATGTTAATT TCAATGAAGA 4200 

TAATTTAGGC TATGATAAGA ATTTAAAAAA ATGCATTAGT TTGACGACAG GTAAATATTG 4260 

CATGATCATG GGCAACGATG ATCTATTAGC AGATGGAGCG TTATCAAAAA TAGTGAAAGT 4320 

TTTGAAGGCT AATCCTGAAA TTGTATTGGC TACGCGAGCG TATGGTTGGT TTAAGGAAAA 4380 

TCCGAATGAG TTATGTGATA CTGTTCGTCA TTTAACAGAC GATACTTTAT TTCAGCCGGG 4440 

GGCTGATGCC ATTAAATTTT TCTTCCGTAG AGTTGGAGTT ATTTCAGGCT TTATTGTCAA 4500 

TGCTGAAAAA GCAAAAAAAC TATCGAGTGA TTTATTTGAT GGGCGTTTAT ATTATCAAAT 4560 

GTACCTTGCT GGTATGCTAA TGGCTGAAGG TCAGGGATAC TATTTTAGCG ACGTGATGAC 4620 

ATTGTCGAGG GATACAGAGG CTCCTGACTT TGGTAACGCT GGAACTGAAA AAGGAGTTTT 4680 

CACCCCGGGG GGGTATAAAC CAGAGGGCCG TATACATATG GTTGAAGGCT TGTTGCTAAT 4740 

TGCAAAATAT ATAGAAGATA CAACAAAAAT TGATGGCGTT TATGCTGGAA TTAGAAAAGA 4800 

CTTAGCGAAC TATTTTTATC CTTATATTCG AGATCAACTC GACTTGCCTC TTTATACTTA 4860 
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TATTAAAATG ATAAATAAAT TTCGGAAAAT. GGGATTTTCA AATGAAAAGC TTTTCTATGT 4920 

GCATGCCTTT TTAGGGTATG TACTAAAACG GAGGGGCTAT GATGCTTTAA TTAAATACAT 4980 

TCGTAGCAAA AAAGGCGGTA CTCCGCGTCT TGGTATTTAA CCTCCACTTT CAAAAAATGT 5040 

TATGAATATA CTTCTTGCTG CGATATTAGG CGTTAACTTA TTTTCTCCAT ATATTAGTTC 5100 

GTGGATGGTG GGTATGCTGC CATTTCCACC AGGAGCAATC CTAAGGGATG TACTCAATGT 5160 

ATTTTTTGTG GCGTTAGTGC TAGTTCGAT1' TGTCATTGAT AGGAAAAAAA CTTATTTCCC 5220 

GTTGGTTTTT ACTATTTTTT CATGGTCGdC GGTAATACTA TGGGTAATAG CGTTAACTAT 5280 

ATTCTCACCG GATAAAATTC AAGCAATTAT GGGGGGGCGG AGTTATATTT TATTCCCGGC 5340 

AGTTTTCATA GCATTAGTGA TTTTAAAAGT ATCATACCCG CAATCCTTAA ATATTGAAAA 5400 

AATAGTTTGC TACATAATTT TTCTAATGTT TATGGTTGCG ACAATATCTA TTATTGATGT 5460 

ACTAATGAAT GGAGAGTTCA TTAAATTGCT CGGATATGAT GAGCATTATG CAGGAGAACA 5520 

ATTAAACTTA ATTAATAGCT ATGA^GGGA^, GGTCCGGGCT ACAGGCGGTT TTAGTGATGC 5580 

TCTCAATTTT GGATATATGC TCACATTAGG TGTTTTGTTA TGTATGGAGT GTTTTTCCCA 5640 

AGGATATAAA AGATTATTGA TGOTTATTAT TAGTTTTGTG CTATTTATAG OGATCTGCAT 5700 
GAGTCTTACT AGAGGAGCAA TACTTGTTGkJ TGCGCTTATT TACGCACTTT ATATAATTTC 5760 
AAATCGGAAG ATGCTTTTTT GTGGAATAAC TTTATTTGTA ATAATTATAC CCGTTTTAGC 5820 
AATTTCTACT AATATTTTTG ACAACTAT AC AGAAATTTTG ATCGGCAGGT TTACAGATTC 5880 
GTCTCAGGCA TCGCGTGGAT CTACACAGGG GCGGATAGAT ATGGCAATTA ATTCATTAAA 5940 
CTTCCTGTCA GAACATCCAT CAGGTATAGG TCTGGGTACT CAAGGTTCAG GAAACATGCT 6000 
TTCGGTAAAA GATAATAGGT TAAATACGGA TAATTATTTT TTCTGGATCG CCCTTGAGAC 6060 
TGGTATTATT GGCTTAATCA TAAATATTAT TTATCTGGCA AGTCAATTTT ATTCTTCAAC 6120 
TTTACTAAAT AGAATATATG GC&G%2ATTC TAGCAATATG CACTATAGAT TATATTTTCT 6180 
CTTTGGAAGT ATATATTTTA TAAGTGCAGC GTTAAGTTCA GCACCTTCGT CATCAACTTT 6240 
TTCTATATAT TATTGGACAG TTTTAGCnlTT GATTCC3VTTT TTAAAATTAA CAAATAGACG 6300 
GTGCACGCGA TAATGAATAA TAAAAAGGTT TTGATGGATA TTAGTTGGTC TAATAAAGGG 6360 
GGGATTGGAC GTTTTACTGA TGAAATTTCT AAACTACTAT GTGATATATC TAAGGAGGAA 6420 
CTATATAGAA AATGTGCTTC TCCG^JTQGCC CCATTAGGTT TAGCAGTCAA TATTTTTCTG 6480 
CGAAAGAAAA CTGATGTGGT TTTTCTTCCT GGCTATATTC CACCACTTTT TTGTT CGAAA 6540 
AAGTTCATAA TAACAATAGA TGATCTAAAT CATCTGGATT TAAATGATAA TTCCTCTCTT 6600 
TTTAAGA6GT TATTTTATAA TT^ATAATA AAGCGCGGTT GTAGAAAAGC ATATAAAATA 6660 
TTTACAGTTT CGAATTTTTcf AAAAGAAAGA A^AGTAGCAT GGTCAGGTGT AAACCCTAAT 6720 
AAAATAGTCA CGGTATATAA TGGGGTATCT AGTCTATTTA ATGCCGATGT AAAACCATTG 6780 
AATTTAGGCT ATAAATATTT GCTATGTGTA GGAAACAGAA AAACTCATAA GAATGAGAAG 6840 
TGTGTTATAT CTGCCTTTGC CAAAGCAGAT ATTGATCCAT CAATAAAACT CGTTTTTACT 6 900 
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GGTAATCCTT GTAATGATTT AGAAAAACTA ATAATACAAC ATGGTTTAAG TGAACGTGTA 6960 

AAGTTCTTTG GGTTCGTGTC TGAAAAAGAT TTACCATCGT TATATAAGGG CTCGTTAGGA 7 020 

TTAGTTTTCC CTTCTTTATA TGAAGGTTTT GGATTACCTG TAGTGGAGGG CATGGCCTGT 7080 

GGTATTCCTG TATTAACTTC TCTAACTTCA TCATTGCCAG AGGTGGCTGG AGATGCAGCG 7140 

ATTCTTGTCG ACCCTCTTTC GGAAGATGCT ATTACTAAAG GAATTTCGAG GTTAATTAAT 7200 

GATTCT GAAC TTCGTAAGCA TTTAATCCAA AAGGGGCTTT TGCGGGCAAA GAGGTTCAAT 7260 

TGGCAAAACG TGGTTAGTGA GATTGAAATG GTACTGACAG AGGCATGTGA TGGAAATAAA 7320 

TGAAATAAAA ATATCTCTCG TTCATGAGTG GTTATTAAGT TATGCAGGCT CCGAACAGGT 73 80 

ATCATCTGCC ATCCTGCATG TTTTTCCTGA AGCGAAGTTA TATTCGGTGG TTGATTTTCT 7440 

AACGGATGAA CAAAGAAGAC ATTTTCTGGG GAAATATGCG ACTACCACAT TTATTCAAAA 7500 

TTTACCTAAA GCTAAAAAAT TTTACCAGAA ATATTTACCA CTAATGCCAC TGGCTATTGA 7560 

ACAACTTGAT TTATCAGATG CTAATATCAT CATTAGTAGC GCCCATTCCG TTGCAAAAGG 7620 

TGTTATTTCC GGACCAGATC AGCTTCACAT TAGCTATGTT CATTCTCCTA TTCGATATGC 7680 

GTGGGATTTA CAGCATCAGT ACCTTAATGA GTCTAACCTG AATAAAGGAA TTAAAGGTTG 7740 

GTTAGCAAAA TGGCTTCTTC ACAAAATACG AATTTGGGAT TCTCGAACCG CAAATGGGGT 7800 

TGATCATTTT ATAGCTAATT CTCAATATAT CGCGCGTAGA ATTAAAAAAG TATACAGACG 7860 

TGAGGCTTCA GTTATATATC CGCCTGTAGA TGTGGATAAT TTTGAAGTAA AAAATGAAAA 7920 

GCAAGACTAT TATTTCACAG CATCCCGTAT GGTACCCTAC AAACGTATTG ATCTTATTGT 7980 

CGAAGCCTTT AGTAAAATGC CGGAAAAGAA ATTAGTAGTT ATTGGTGATG GACCGGAGAT 8040 

GAAAAAAATA AAGAGCAAGG CTACAGACAA TATAAAATTG CTCGGTTATC AATCTTTTCC 8100 

TGTTTTAAAA GAGTATATGC AGAGCGCCAG GGCGTTTGTT TTTGCAGCGG AAGAGGACTT 8160 

TGGAATAATA CCTGTCGAAG CTCAAGCTTG CGGTACCCCT GTTATTGCCT TTGGGAAGGG 8220 

TGGGGCCTTA GAAACCGTTC GCCCACTAGG TGTAGAGGAA CCGACTGGCA TTTTCTTCAA 8280 

GGAACAGAAT ATTGCTTCTT TGCATGAAGC TGTTAGTGAA TTTGAAAAAA ATGCATCATT 8340 

TTTTACATCT CAGGCTTGTA GAAAAAATGC AGAAAAATTT TCTCGATCAA GATTTGAACA 8400 

AGAATTTAAG AACTTTGTTA ATGAAAAGTG GAATCTTTTC AAAACAGAAC AGATTATTAA 8460 

ACGTTAATTA TGGTTTATTG AATGTCTAAA TTAATACCAG TAATAATGGC CGGTGGGATT 8520 

GGTAGCCGTT TGTGGCCACT TTCACGTGAA GAGCATCCGA AACAGTTTTT AAGCGTAGAT 8580 

GGTGAATTAT CTATGCTGCA AAACACCATT AAAAGATTGA CTCCTCTTTT GGCTGGAGAA 8640 

CCTTTAGTCA TTTGTAATGA TAGTCACCGC TTCCTTGTCG CTGAACAACT TCGAGCTATA 8700 

AATAAACTAG CAAATAACAT CATATTAGAG CCAGTGGGGC GTAATACAGC CCCAGCTATA 8760 

GCGCTGGCCG CTTTTTGTTC ACTTCAGAAT GTCGTCGATG AAGACCCGCT TTTGCTTGTC 8820 

CTTGCTGCGG ATCATGTCAT CCGCGATGAG AAAGTGTTTC TTAAAGCTAT CAATCACGCT 8880 

GAATTTTTTG CAACACAAGG TAAGCTAGTA ACGTTTGGTA TTGTACCCAC ACAGGCCGAA 8940 
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ACTGGCTACG GTTATATTTG TAGAGGTGAA GCAATCGGGG AAGATGCTTT TTGTGTAGCC 9000 

GAATTTGTAG AGAAGCCTGA TTTCGATACA GCGCGTCATT ATGTAGAATC AGAGAAATAT 9060 

TATTGGAACA GCGGTATGTT CCTATTTCGT GCAAGTAGTT ACTTACAAGA ATTAAAGGAT 9120 

CTGTCCCCCG ATATTTACCA AGCATGTGAA AATGCGGTAG GGAGTATTAA TCCTGATCTT 9180 

GATTTTATCC GTATTGATAA AGAAGCATTC GCAATGTGCC CTAGTGATTC TATCGATTAT 9240 

GCGGTAATGG AACATACTAG GCATGCAGTT GTCGTACCGA TGAATGCCGG CTGGTCAGAT 9300 

GTGGGGTCAT GGTCTTCACT GTGGGATATT TCTAAGAAAG ATCCACAACG TAATGTATTA 9360 

CATGGCGATA TTTTTGCATA TAATAGTAAA GATAATTATA TCTATTCTGA AAAATCGTTT 9420 

ATTAGTACAA TCGGAGTAAA TAATTTAGTT ATCGTGCAGA CAGCAGATGC ATTATTAGTA 9480 

TCTGATAAAG ATTCAGTCCA GGATGTTAAA AAAGTTGTTG ATTATTTAAA AGCTAATAAT 9540 

AGAAACGAAC ATAAAAAACA TTTAGAGGTT TTCCGACCGT GGGGAAAATT TAGCGTAATT 9600 

CATAGTGGCG ATAATTATTT AGTTAAAAGA ATAACTGTTA AACCAGGCGC GAAGTTTGCT 9660 

GCTCAGATGC ATCTCCATCG TGCTGAGCAT TGGATAGTGG TATCTGGTAC TGCTTGTATT 9720 

ACTAAGGGGG AAGAAATTTT TACAATTTCG GAGAATGAAT CAACATTTAT ACCTGCTAAT 9780 

ACAGTTCATA CGTTAAAAAA CCCCGCGACT ATTCCATTAG AACTAATAGA AATTCAATCT 9B40 

GGCACCTATC TTGCGGAGGA TGATATTATT CGCCTGGAGA AACATTCTGG ATATCTGGAG 9900 

TAATGAATTG ATGAAAAATA TATATAATAC TTACGATGTT ATCAACAAAT CTGGAATTAA 9960 

TTTTGGAACC AGTGGTGCCC GCGGCCTTGT TACCGATTTT ACACCCGAAG TTTGCGCACG 10020 

ATTTACCATT TCCTTTTTGA CAGTAATGCA GCAAAGATTC TCATTTACAA CGGTTGCGCT 10080 

CGCAATTGAT AATCGTCCAA GCAGTTACGC GATGGCTCAA GCTTGTGCCG CTGCTTTGCA 10140 

AGAAAAAGGA ATTAAAACCG TTTACTATGG CGTAATTCCA ACACCTGCTT TAGCTCATCA 10200 

ATCAATTTCC GATAAAGTAC CTGCAATCAT GGTTACTGGC AGTCATATCC CTTTTGACCG 10260 

TAATGGCCTG AAATTTTATA GACCAGATGG TGAAATTACT AAAGATGATG AGAATGCTAT 10320 

TATTCATGTT GATGCCTCAT TTATGCAGCC TAAGCTTGAA CAATTGACAA TTTCCACAAT 10380 

CGCTGCTAGA AATTATATTC TACGATATAC CTCATTATTT CCAATGCCAT TCTTGAAAAA 10440 

TAAGCGCATT GGAATTTATG AGCATTCTAG TGCGGGTCGT GATCTCTATA AGACGTTATT 10500 

CAAAATGTTG GGTGCTACAG TTGTTAGTTT AGCAAGGAGC GACGAATTTG TTCCTATTGA 10560 

TACTGAAGCT GTAAGTGAAG ATGATAGAAA TAAAGCAATC ACATGGGCAA AAAAATATCA 10620 

GTTAGATGCT ATATTTTCAA CTGATGGTGA TGGAGATCGC CCTCTGATAG CTGACGAATA 10680 

TGGAAATTGG TTAAGAGGAG ATATATTAGG CCTTCTGTGC TCTCTCGAAT TAGCTGCTGA 10740 

TGCAGTCGCT ATTCCTGTAA GCTGCAACAG TACAATCTCA TCTGGTAACT TTTTTAAACA 10800 

TGTGGAACGA ACAAAGATTG GTTCACCCTA TGTGATTGCA GCATTTGCTA AATTATCTGC 10860 

AAACTATAAT TGTATAGCTG GTTTTGAAGC GAATGGTGGC TTTCTGCTAG GTAGCGATGT 10920 

TTATATTAAT CAGCGTTTAC TTAAGGCATT AC CAACACGT GATGCTTTAT TACCTGCCAT 1098 0 
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TATGCTTCTG TTTGGTAGCA AGGACAAAAG TATTAGTGAG CTTGTTAAAA AACTTCCTGC 11040 

TCGCTATACC TATTCAAACA GATTACAGGA TATAAGTGTT AAAACAAGTA TGTCTTTAAT 11100 

AAATCTTGGT CTGACAGATC AAGAGGATTT TTTGCAGTAT ATTGGTTTTA ATAAACATCA 11160 

TATATTACAT TCTGATGTTA CTGATGGCTT TAGAATCACT ATCGATAACA ACAATATTAT 1122 0 

TCATTTACGA CCTTCAGGCA ATGCCCCTGA GTTGCGTTGC TATGCGGAGG CTGACTCGCA 11280 

AGAGGATGCA TGTAATATTG TTGAAACTGT TCTCTCTAAT ATCAAAAGCA AACTGGGTAG 11340 

AGCTTAATGC TGTTGATAAT AGAGCGTTTC TTTCCAGTAA TACTTTGTCT GGTTATCTGG 114 00 

TACCCAAGTT GAGGGTGAGA ATTAAATGGA TCGTTTTGAT AATAAGTATA ACCCAAATTT 11460 

ATGCAAAATA TTATTGGCTA TATCAGATTT AC^GTTTTTT AATGTAGCCT TATGGGCATC 11520 

GTTAGGAGTT GTATATTTAA TCTTTGATGA AGTTCAGCGA TTTGTACCAC AAGAGCAATT 11580 

AGATAATCGA TTTATATCAC ATTTTATTCT ATCTATAGTA TGCGTTGGAT GGTTTTGGGT 11640 

TCGACTGCGT CACTATACAT ATCGAAAGCC ATTCTGGTAT GAGTTGAAAG AGGTTATTCG 117 00 

TACTATCGTT ATTTTTGCTG TGTTTGATTT GGCTTTAATT GCGTTTACAA AATGGCAGTT 11760 

TTCACGCTAT GTCTGGGTGT TTTGTTGGAC TTTTGCCATA ATCCTGGTGC CTTTTTTTCG 11820 

CGCACTTACA AAGCATTTAT TGAACAAGCT AGGTATCTGG AAGAAAAAAA CTATCATCCT 118 80 

TGGGAGCGGA CAGAATGCTC GTGGTGmTA TOCyGCGCTG. CAAAGTGAGG AGATGATGGG 11940 

GTTTGATGTT ATCGCTTTTT TTGATACGGA TOCGTCAGAT GCTGAAATAA ATATGTTGCC 12000 

GGTGATAAAG GACACTGAGA CTATTTGGGA TTTAAATCGT ACAGGTGATG TCCATTATAT 12060 

CCTTGCTTAT GAATACACCG AGTTGGAGAA AACACATTTT TGGCTACGTG AACTTTCAAA 12120 

ACATCATTGT CGTTCTGTTA CTGTCGTCCC CTCGTTTAGA GGATTGCCAT TATATAATAC 12180 

TGATATGTCT TTTATCTTTA GCCATGAAGT TAT*GTTATTA AGGATACAAA ATAACTTGGC 12240 

TAAAAGGTCG TCCCGTTTTC TCAAACGGAC ATTTGATATT GTTTGTTCAA TAATGATTCT 12300 

TATAATTGCA TCACCACTTA TGATTTATCT GTGGTATAAA GTTACTCGAG ATGGTGGTCC 12360 
GGCTATTTAT GGTCACCAGC GAGTAGGTCG GCATGGAAAA CTTTTTCCAT GCTACAAATT 12420 
TCGTTCTATG GTTATGAATT C 12441 

(2) INFORMATION FOR SEQ ID NO: 4 : 

<i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22080 base pairs 

(B) type: nucleic acid 

(C) STRA1TOEDNESS: dOutalfe 

(D) TOPOLOGY: linear. 

(ii) MOLECULE TYPE: DNA (genomic) 

( iv) ANTI -SENSE : YES 

Cvi) ORIGINAL SOURCE: 

(A) ORGANISM: S. enterioa serovar typhimurium (serogroup B) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 
GAATTCGGGA GGCGCAATGA AAGTCAGCTT TTTTCTGCTG AAATTTCCAC TCTCATCGGA 6 0 

AACCTTTGTG CTGAATCAGA TTACTGCGTT TATTGATATG GGCCATGAGG TGGAGATTGT 12 0 

CGCGTTACAA AAAGGCGATA CCCAACATAC TCACGCCGCC TGGGAGAAGT ATGGCCTGGC 180 

GGCGAAAACC CGCTGGTTAC AGGATGAGCC CCAGGGACGG CTGGCGAAAC TGCGCTACCG 240 

GGCATGTAAA ACGCTGCCGG GGCTGCATCG GGCGGCGACC TGGAAAGCGC TCAATTTTAC 300 

CCGCTATGGC GATGAATCAC GCAATTTGAT CCTTTCCGCG ATTTGCGCGC AGGTGAGCCA 360 

GCCTTTTGTG GCGGATGTGT TTATCGCACA CTTTGGTCCG GCGGGCGTGA CGGCGGCCAA 420 

ACTACGCGAA CTGGGCGTGC TTCGCGGCAA AATCGCGACT ATTTTCCACG GGATTGATAT 480 

CTCTAGTCGT GAGGTGCTCA GTCATTACAC GCCGGAGTAT CAGCAGTTGT TTCGTCGTGG 540 

CGATCTGATG CTGCCCATCA GCGATCTGTG GGCCGGTCGC CTGAAAAGTA TGGGCTGTCC 600 

GCCGGAAAAG ATTGCCGTTT CGCGCATGGG CGTCGACATG ACGCGTTTTA CCCATCGTTC 660 

GGTGAAAGCG CCAGGGATGC CGCTGGAGAT GATTTCCGTC GCGCGCCTGA CAGAAAAAAA 720 

AGGCCTGCAT GTGGCGATTG AAGCCTGTCG GCAACTGAAA GCACAGGGCG TGGCGTTTCG 780 

CTACCGCATT CTGGGGATTG GCCCGTGGGA ACGTCGGCTG CGCACGCTCA TCGAGCAGTA 840 

TCAGCTAGAG GATGTCATTG AGATGCCGGG GTTTAAACCG AGCCATGAAG TGAAGGCGAT 900 

GCTGGATGAC GCCGATGTTT TTTTGCTGCC GTCGATTACC GGTACGGATG GCGATATGGA 960 

AGGTATTCCG GTAGCGCTGA TGGAGGCGAT GGCGGTAGGG ATTCCCGTGG TATCTACCGT 102 0 

GCATAGCGGT ATTCCGGAAC TGGTGGAGGC CGGCAAATCC GGCTGGCTGG TGCCGGAAAA 1080 

CGATGCGCAG GCGCTGGCGG CCCGACTCGC TGAGTTCAGC CGGATTGACC ACGACACGCT 1140 

GGAGTCGGTG ATCACGCGCG CCCGTGAAAA AGTGGCGCAA GATTTTAATC AGCAGGCGAT 1200 

TAATCGCCAG TTAGCCAGCC TGCTACAAAC GATATAAACG AGGTGGTATG CCCGCGACTA 1260 

AATTCTCCCG ACGTACCCTC CTGACGGCAG GTTCTGCGCT TGCTGTTCTT CCTTTTCTGC 1320 

GCGCCTTGCC GGTACAGGCG CGTGAACCTC GCGAGACCGT CGATATTAAG GATTATCCGG 1380 

CGGATGACGG TATCGCCTCG TTCAAACAGG CCTTCGCCGA CGGACAGACC GTGGTCGTAC 1440 

CGCCAGGATG GGTGTGTGAA AATATCAATG CGGCGATAAC GATTCCGGCG GGAAAAACGC 1500 

TGCGGGTACA GGGCGCGGTG CGTGGGAATG GCCGGGGACG GTTTATTTTG CAGGACGGGT 1560 

GTCAGGTGGT GGGGGAGCAG GGCGGCAGTC TGCACAATGT GACGCTGGAT GTTCGCGGGT 1620 

CGGACTGTGT GATTAAAGGC GTGGCGATGA GCGGCTTTGG CCCCGTCGCG CAAATTTTCA 1680 

TCGGTGGTAA GGAACCGCAG GTGATGCGTA ATCTCATTAT CGATGACATC ACCGTTACCC 1740 

ACGCCAACTA CGCCATTCTC CGCCAGGGAT TTCATAACCA AATGGATGGC GCGCGGATTA 1800 

CGCATAGCCG CTTTAGCGAT TTACAGGGGG ACGCCATTGA GTGGAATGTC GCGATTCACG 186 0 

ACCGCGACAT CCTGATTTCC GATCATGTCA TCGAACGCAT TAATTGTACC AATGGCAAAA 1920 
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TCAACTGGGG GATCGGCATC GGGCTGGCGG GTAGCACCTA TGACAACAGT TATCCTGAAG 1980 

ACCAGGCAGT AAAAAACTTT GTGGTGGCCA ATATTACCGG ATCTGATTGC CGACAGCTTG 2040 

TGCACGTAGA AAATGGCAAA CATTTCGTCA TTCGCAATGT CAAAGCCAAA AACATCACGC 2100 

CCGGTTTCAG TAAAAATGCG GGTATTGATA ACGCAACGAT CGCAATTTAT GGCTGTGATA 2160 

ATTTCGTCAT TGATAATATT GATATGACGA ATAGTGCCGG GATGCTCATC GGCTATGGCG 2220 

TCGTTAAAGG AAAATACCTG TCAATTCCGC AAAACTTTAA ATTAAACGCT ATTCGGTTGG 2280 

ATAATCGCCA GGTTGCTTAT AAATTACGCG GCATTCAAAT TTCCTCCGGC AACACCCCCT 2340 

CTTTTGTCGC CATCACCAAT GTACGGATGA CGCGTGCTAC GCTGGAACTG CATAATCAAC 2400 

CGCAGCACCT CTTTCTGCGC AATATCAACG TGATGCAAAC TTCAGCGATT GGCCCGGCGT 2460 

TAAAAATGCA TTTCGATTTG CGTAAAGATG TACGTGGTCA ATTTATGGCC CGCCAGGACA 2 520 

CGCTGCTTTC CCTCGCTAAT GTTCATGCCA TCAATGAAAA CGGGCAGAGT TCCGTGGATA 2580 

TCGACAGGAT TAATCACCAA ACCGTGAATG TCGAAGCAGT GAATTTTTCG CTGCCGAAGC 2640 

GGGGAGGGTA AGTACCGCTA TTTTTACGAA AATTCCTGGG AAAAAGTTGT TCATACTTAA 2700 

TGTTATGGTG CCGACTAAGA CGTAATGTAG AGCGTGCCAT CATTATCCCT GGCAGCAGAG 2760 

TAATTCATGC TGGCGAAAAC AAGCTAAAGA GCTATAATTC AGCAACCATT TTACAGGTGG 2820 

AAGAAACAAT GATGAATTTG AAAGCAGTTA TACCGGTAGC GGGTTTGGGT ATGCATATGT 2880 

TGCCTGCCAC CAAGGCAATC CCAAAAGAGA TGCTACCGAT CGTCGACAAG CCAATGATTC 2940 

AGTACATTGT CGATGAGATT GTGGCTGCAG GGATCAAAGA AATCGTGCTG GTGACTCACG 3000 

CGTCTAAAAA CGCCGTTGAG AACCACTTCG ACACCTCTTA TGAACTTGAA TCACTTCTTG 3060 

AGCAGCGCGT TAAGCGTCAG CTTTTGGCGG AAGTGCAATC TATCTGCCCA CCGGGCGTGA 3120 

CGATTATGAA CGTTCGCCAG GCGCAGCCGT TAGGGCTGGG GCATTCTATT CTGTGCGCGC 3180 

GTCCGGTCGT GGGCGATAAC CCTTTCATTG TGGTACTCCC GGATATTATT ATCGATGATG 3240 

CTACCGCCGA TCCGCTGCGC TATAACCTTG CGGCGATGGT GGCGCGTTTC AATGAAACGG 3300 

GTCGCAGCCA GGTGCTGGCG AAGCGCATGA AAGGTGATTT ATCGGAGTAT TCCGTTATCC 3360 

AGACGAAAGA ACCTCTGGAT AATGAAGGCA AAGTCAGCCG GATTGTGGAG TTTATCGAAA 3420 

AACCGGATCA GCCGCAGACG CTGGATTCCG ATTTGATGGC GGTAGGCCGT TATGTGCTTT 3480 

CAGCCGACAT CTGGGCGGAA CTGGAAAGAA CCGAACCGGG CGCCTGGGGC CGCATCCAGC 3540 

TCACCGATGC CATTGCTGAA CTGGCGAAAA AACAGTCGGT TGACGCGATG CTAATGACGG 3600 

GTGACAGCTA TGACTGCGGT AAAAAAATGG GCTACATGCA GGCATTTGTG AAGTACGGGC 3660 

TGCGCAACCT GAAAGAAGGA GCCAAGTTCC GTAAGAGCAT AGAGCAGCTT TTGCATGAAT 3720 

AAGTATTAAC AACCGTGATA AATGGTTGGT GATAAACATA ATAACGGCAG TGAACATTCG 3780 

AAGCGGCAAG TTGGCTGAAA. CGAGTGTTGA CTGCCGTTTT AGTTTTGTAT AAAGGGCTTA 3840 
AGTAACAAGG GGTTATCTGG AGCATTTTAA TGCTGAT TTT ATAAGATTAA TCCTTGTTTC 3900 
CGGATGCAAT TAATAAGACA ATTAGCGTTT AAGTTTTAGT GAGCTTTGCC CTGCTGGGCG 3 960 
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AGGTTTGCAA CAAGTCGATA TGTACGCAGT GCACTGGTAG CTGATGAGCC AGGGGCGGTA 402 0 

GCGTGTGTAA CGACTTGAGC AATTAATTTT TATTGGCAAA TTAAATACCA CATTAAATAC 408 0 

GCCTTATGGA ATAGAAAAGT GAAGATACTT ATTACTGGCG GGGCAGGTTT TATTGGATCA 4140 

GCTGTTGTCC GCCATATTAT TAAGAATACA CAGGACACTG TAGTTAATAT TGATAAATTA 4200 

ACCTACGCCG GTAATCTTGA ATCCCTTTCT GATATTTCTG AAAGTAATCG CTACAATTTT 4260 

GAACACGCGG ATATTTGTGA TTCCGCTGAA ATAACGCGTA TTTTTGAGCA GTACCAGCCG 4320 

GACGCGGTGA TGCATTTGGC TGCGGAAAGT CATGTGGACC GTTCGATTAC CGGGCCAGCA 43 80 

GCATTTATTG AAACCAATAT CGTCGGCACC TATGCACTTC TTGAAGTTGC GCGTAAATAC 4440 

TGGTCTGCCC TTGGCGAAGA TAAAAAAAAT AATTTTCGTT TTCATCATAT TTCCACTGAT 4500 

GAAGTTTACG GCGATTTACC GCATCCTGAT GAAGTTGAAA ACAGCGTTAC GCTGCCGTTA 4560 

TTTACTGAAA CGACGGCATA TGCGCCAAGT AGCCCCTATT CTGCGTCAAA AGCATCCAGC 4620 

GATCATTTAG TCCGTGCCTG GCGGCGTACC TATGGTCTAC CAACGATCGT TACCAATTGT 4680 

TCTAATAACT ATGGCCCTTA TCACTTCCCT GAAAAACTGA TTCCGTTGGT CATTTTGAAC 4740 

GCACTGGAAG GAAAGCCTTT GCCAATTTAT GGCAAAGGGG ATCAGATTCG CGATTGG CTA 4800 

TATGTAGAAG ATCATGCTCG CGCGCTTCAT ATGGTAGTGA CTGAAGGCAA GGCAGGGGAG 4860 

ACTTATAACA TTGGTGGACA CAATGAGAAG AAAAATCTCG ATGTGGTATT TACCATCTGT 4920 

GATCTGCTGG ATGAGATTGT ACCCAAAGCG ACTTCTTATC GTGAACAAAT CACTTATGTC 4980 

GCGGATCGTC CGGGCCATGA TCGTCGTTAT GCCATTGATG CAGGTAAAAT TAGCCGCGAA 5040 

TTAGGCTGGA AACCGCTGGA GACCTTTGAA AGCGGTATTC GTAAAACAGT GGAATGGTAC 5100 

CTTGCAAATA CTCAATGGGT AAACAATGTT AAAAGTGGGG CGTATCAGAG TTGGATAGAA 516 0 

CAGAACTATG AAGGACGCCA GTAATGAATA TCTTACTTTT TGGTAAGACA GGGCAAGTAG 5220 

GCTGGGAGTT GCAACGTTCT CTGGCACCGG TAGGGAATCT GATTGC CCTG GATGTCCATT 5280 

CAAAAGAGTT TTGCGGTGAT TTTAGTAATC CGAAAGGCGT TGCCGAAACC GTTCGTAAGC 5340 

TTCGTCCCGA TGTGATTGTT AACGCAGCAG CCCATACTGC AGTAGATAAA GCAGAGTCTG 5400 

AACCAGAACT GGCGCAGTTA CTTAACG CCA CCAGTGTGGA AGCCATCGCT AAAGCAGCCA 5460 

ACGAAACTGG CGCATGGGTA GTGCATTATT CAACCGATTA TGTATTTCCT GGTACCGGCG 5520 

ATATCCCATG GCAGGAAACG GACGCTACGT CGCCGCTGAA TGTCTATGGC AAAACCAAAC 5580 

TGGCGGGAGA AAAGGCCCTG CAGGATAACT GCCCTAAACA CCTTATCTTC CGCACCAGTT 5640 

GGGTTTATGC AGGTAAGGGC AATAATTTCG CAAAGACAAT GCTTCGTCTG GCGAAAGAGC 5700 

GTCAGACACT TTCAGTCATT AACGATCAGT ACGGTGCGCC AACCGGTGCG GAATTACTGG 5760 

CTGACTGTAC GGCGCATGCG ATCCGTGTGG CGTTAAATAA ACCAGAAGTC GCAGGTCTTT 5820 

ACCATCTGGT TGCCGGGGGA ACCACAACCT GGCATGACTA CGCGGCCTTA GTCTTTGACG 5880 

AGGCGCGCAA AGCAGGGATA ACGCTTGCGC TGACTGAGCT TAATGCTGTG CCGACCAGCG 5940 

CCTACCCGAC GCCGGCGAGC AGACCAGGCA ATTCGCGTCT CAATACTGAA AAGTTTCAGC 6000 
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GTAATTTTGA CCTTATTCTG CCTCAATGGG -AATTAGGAGT TAAGCGTATG CTGACTGAAA 6 060 

TGTTTACGAC GACAACCATC TAATAAATTT AAATGCCCAT CAGGGCATTT TCTATGAATG 6120 

AGAAATGGAA ATGAAAACGC GTAAGGGCAT TATTTTAGCG GGGGGCTCCG GCACCCGTCT 6180 

TTATCCGGTG ACCATGGCGG TAAGTAAGCA ATTGCTACCA ATTTATGATA AACCGATGAT 6240 

TTACTATCCC CTTTCCACGC TTATGCTGGC AGGCATTCGG GATATCCTGA TCATCAGTAC 6300 

GCCACAGGAC ACGCCGCGTT TTCAACAACT GCTGGGAGAC GGCAGCCAGT GGGGGCTGAA 6360 

TCTTCAATAT AAAGTACAGC CAAGCCCGGA TGGCTTAGCA CAGGCGTTTA TTATTGGTGA 6420 

AGAGTTCATT GGTCATGATG ATTGTGCATT AGTGCTGGGT GACAATATCT TCTATGGTCA 6480 

TGATTTACCA AAGTTAATGG AAGCTGCCGT TAATAAAGAA AGTGGTGCTA CCGTCTTCGC 6540 

TTATCATGTA AACGATCCGG AGCGCTACGG TGTGGTTGAG TTTGACCAAA AGGGCACAGC 6600 

CGTXAGTCTG GAAGAAAAAC CATTACAACC GAAGAGTAAT TACGCGGTAA CGGGGCTGTA 6660 

TTTTTATGAT AATAGCGTGG TGGAGATGGC GAAAAATCTT AAGCCTTCCG CTCGCGGTGA 6720 

GTTAGAAATC ACGGATATTA ACCGTATCTA TATGGAGCAG GGAAGATTGT CTGTCGCTAT 6780 

GATGGGGCGC GGTTATGCCT GGCTGGATAC AGGGACGCAT CAGAGTTTGA TAGAGGCCAG 6840 

TAATTTTATT GCAACCATCG AAGAACGCCA GGGGCTAAAA GTGTCCTGCC CGGAAGAGAT 6 900 

CGCATTTCGT AAAAATTTTA TAAATGCACA ACAGGTTATA GAACTGGCCG GGCCATTATC 6960 

AAAAAATGAT TATGG CAAAT ATTTGCTGAA GATGGTGAAA GGTTTATAAG TGATGATTGT 7020 

GATTAAAACA GCAATACCAG ATGTCTTGAT CTTAGAGCCT AAAGTTTTTG GCGATGAGAG 7080 

GGGATTCTTT TTTGAAAGTT ATAACCAGCA GACCTTTGAA GAGTTGATTG GACGTAAAGT 7140 

TACATTTGTT CAAGATAATC ATTCAAAATC CAAAAAGAAC GTACTCAGAG GGCTACATTT 7200 

TCAGAGAGGA GAAAATGCAC AGGGGAAGTT AGTTCGTTGT GCTGTCGGTG AGGTTTTTGA 7260 

TGTTGCGGTC GATATCCGAA AAGAATCGCC TACTTTTGGT CAATGGGTTG GTGTAAATCT 7320 

GTCTGCTGAG AATAAGCGAC AGCTTTGGAT TCCAGAAGGT TTTGCTCATG GTTTTGTTAC 7380 

TCTTAGTGAG TATGCAGAGT TTCTGTACAA AGCAACTAAT TATTACTCAC CTTCATCGGA 7440 

AGGTAGCATT CTATGGAATG ATGAGGCAAT AGGTATTGAA TGGCCTTTTT CTCAGCTGCC 7500 

TGAGCTTTCA GCAAAAGATG CTGCAGCACC TTTACTGGAT CAAGCCTTGT TAACAGAGTA 7560 

AGCATCGTGT CT CAT ATT AT TAAGATTTTT C CATCAAATA TTGAATTTTC CGGTAGAGAG 7620 

GATGAATCAA TCCTCGATGC TGCGCTATCG GCTGGTATCC ATCTTGAACA TAGCTGCAAA 7680 

GCGGGTGATT GTGGTATCTG TGAGTCCGAT TTGTTGGCGG GAGAAGTTGT TGACTCCAAA 7740 

GGTAATATTT TTGGACAGGG TGATAAAATA CTAACCTGCT GCTGTAAACC TAAAACCGCC 7800 

CTTGAGCTAA ATGCGCATTT TTTTCCTGAA CTAGCTGGAC AGACAAAAAA AATTGTCCCA 7860 
TGCAAGGTAA ATAGTGCTGT ACTGGTTTCA GGCGATGTTA TGACTTTGAA GTTACGCACA 7920 
CCACCAACAG CAAAAATTGG CTTCCTTCCA GGGCAGTATA TCAATTTACA TTATAAAGGT 7 980 
GTAACTCGCA GTTATTCTAT CGCTAATAGT GATGAGTCGA ATGGTATTGA GTTGCATGTA 8 040 
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AGGAATGTTC CCAATGGTCA GATGAGTTCG . CTCATTTTTG GGGAGTTACA AGAAAATACT 8100 

CTTATGCGCA TTGAAGGGCC TTGCGGAACA TTTTTTATTC GTGAAAGTGA CAGACCTATA 8160 

ATCTTCCTTG CAGGCGGTAC TGGATTCGCT CCAGTTAAAT CAATGGTTGA GCATCTCATT 8220 

CAGGGAAAAT GTCGTCGTGA GATCTACATT TACTGGGGAA TGCAATATAG TAAAGATTTT 8280 

TACTCTGCAT TACCGCAGCA GTGGAGTGAA CAGCACGACA ACGTTCATTA TATCCCTGTT 8340 

GTTTCTGGTG ATGACGCCGA ATGGGGGGGA AGAAAGGGAT TTGTCCATCA TGCCGTGATG 8400 

GATGATTTTG ATTCTCTAGA GTTCTTCGAT ATATATGCAT GTGGTTCACC TGTGATGATC 8460 

GATGCCAGTA AAAAGGACTT TATGATGAAA AATCTCT CTG TAGAACATTT CTATTCTGAT 8520 

GCATTTACCG CATCTAATAA TATTGAGGAT AATTTATGAA AGCGGTCATC CTGGCTGGTG 858 0 

GACTTGGTAC CAGACTAAGT GAAGAAACAA TTGTAAAACC AAAACCGATG GTAGAAATTG 8640 

GTGGCAAGCC TATTCTTTGG CACATTATGA AAATGTATTC TGTGCATGGT ATCAAGGATT 8700 

TTATTATCTG CTGTGGTTAT AAAGGATATG TGATTAAAGA ATATTTTGCG AACTACTTCC 876 0 

TTCACATGTC AGATGTAACA TTCCATATGG CTGAAAACCG TATGGAAGTT CACCATAAAC 8820 

GTGTTGAACC ATGGAATGTC ACATTGGTTG ATACGGGTGA TTCTTCAATG ACTGGTGGTC 8880 

GTCTGAAACG TGTTGCTGAA TACGTAAAAG ATGACGAGGC TTTCCTGTTT ACTTATGGTG 8940 

ATGGCGTTGC CGACCTTGAT ATCAAAGCGA CTATCGATTT CCATAAGGCT CACGGTAAGA 9000 

AAGCGACTTT AACAGCTACT TTTCCACCAG GACGCTTTGG CGCATTAGAT ATCCGAGCTG 9060 

GTCAGGTCCG GTCATTCCAG GAAAAACCGA AAGGCGATGG GGCAATGATC AATGGTGGTT 9120 

TCTTTGTGTT GAATCCATCG GTTATCGATC TCATCGATAA CGATGCAACA ACCTGGGAAC 9180 

AAGAGCCATT AATGACATTG GCACAACAGG GGGAGTTAAT GGCTTTTGAA CACCCAGGTT 9240 

TCTGGCAGCC GATGGATACC CTACGTGATA AAGTTTACCT CGAAGGGCTG TGGGAAAAAG 9300 

GTAAAGCTCC GTGGAAAACC TGGGAGTAAC TAGATGATTG ATAAAAATTT TTGGCAAGGT 9360 

AAACGTGTAT TCGTTACCGG CCATACTGGC TTTAAAGGAA GCTGGCTTTC GCTATGGCTG 9420 

ACTGAAATGG GTGCAATTGT AAAAGGCTAT GCACTTGATG CGCCAACTGT TCCAAGTTTA 9480 

TTTGAGATAG TGCGTCTTAA TGATCTTATG GAATCTCATA TTGGCGACAT TCGTGATTTT 9540 

GAAAAGCTGC GCAATTCTAT TGCAGAATTT AAGCCAGAAA TTGTTTTCCA TATGGCAGCC 9600 

CAGCCTTTAG TGCGCCTATC TTATGAACAG CCAATCGAAA CATACTCAAC AAATGTTATG 9660 

GGTACTGTCC ATTTGCTTGA AACAGTTAAG CAAGTAGGTA ACATAAAGGC AGTCGTAAAT 972 0 

ATCACCAGTG ATAAGTGCTA CGACAATCGT GAGTGGGTGT GGGGCTATCG TGAGAACGAA 9780 

CCCATGGGAG GGTACGATCC ATACTCTAAT AGTAAAGGTT GTGCAGAATT AGTCGCGTCT 9840 

GCATTCCGGA ACTCATTCTT CAATCCTGCA AATTATGAGC AACATGGCGT TGGTTTGGCG 9900 

TCTGTGAGGG CTGGTAATGT CATAGGCGGA GGCGATTGGG CTAAAG AC CG TTTAATTCCC 9960 

GATATTCTGC GCTCATTTGA AAATAACCAG CAGGTTATTA TTCGAAACCC ATATTCTATC 10020 

CGTCCCTGGC AGCATGTACT GGAGCCTCTT TCTGGTTACA TTGTGGTGGC GCAACGCTTA 10080 
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TATACAGAAG GTGCTAAGTT TTCTGAAGGA TGGAATTTCG GCCCGCGTGA TGAAGATGCG 10140 

AAGACGGTCG AATTTATTGT TGACAAGATG GTCACGCTTT GGGGTGATGA TGCAAGCTGG 10200 

TTACTGGATG GTGAGAATCA TCCTCATGAG GCACATTACC TGAAACTGGA TTGCTCTAAA 10260 

GCAAATATGC AATTAGGATG GCATCCGCGT TGGGGATTGA CTGAAACACT TGGTCGCATC 10320 

GTAAAATGGC ATAAAGCATG GATTCGCGGC GAAGATATGT TGATTTGTTC AAAGCGTGAA 10380 

ATCAGCGACT ATATGTCTGC AACTACTCGT TAAGAAAATA AGTTTAAGGA ATCAAAGTAA 10440 

TGACAGCAAA TAACCTGCGT GAGCAAATCT CTCAGCTTGT CGCTCAGTAT GCGAATGAGG 10500 

CATTGAGCCC GAAACCTTTT GTTGCAGGTA CAAGCGTTGT GCCTCCTTCC GGGAAGGTTA 10560 

TTGGTGCCAA AGAGTTACAA TTGATGGTTG AGGCGTCTCT TGATGGATGG CTAACTACTG 10620 

GTCGTTTCAA TGATGCCTTT GAAAAAAAAC TTGGGGAATT TATTGGGGTT CCTCATGTTT 10680 

TAACGACAAC ATCTGGCTCT TCGGCAAACT TGCTGGCACT GACTGCGCTG ACTTCCCCAA 10740 

AATTAGGCGA GCGAGCTCTC AAACCTGGTG ATGAGGTTAT TACTGTCGCT GCTGGCTTCC 10800 

CGACTACAGT TAACCCGGCG ATCCAGAATG GTTTAATACC GGTATTCGTG GATGTTGATA 10860 

TCCCGACATA TAATATCGAT GCCTCTCTCA TTGAAGCTGC AGTTACTGAG AAATCAAAAG 10920 

CGATAATGAT CGCTCATACA CTCGGTAATG CATTTAACCT GAGTGAAGTT CGTCGGATTG 10980 

CCGATAAATA TAACTTATGG TTGATTGAAG ACTGCTGTGA TGCCCTTGGG ACGACTTATG 11040 

AAGGCCAGAT GGTAGGTACC TTTGGTGACA TCGGAACCGT TAGTTTTTAT CCGGCTCACC 11100 

ATATCACAAT GGGTGAAGGC GGTGCTGTAT TCACCAAGTC AGGTGAACTG AAGAAAATTA 11160 

TTGAGTCGTT CCGTGACTGG GGCCGGGATT GTTATTGTGC GCCAGGATGC GATAACACCT 11220 

GCGGTAAACG TTTTGGTCAG CAATTGGGAT CACTTCCTCA AGGCTATGAT CACAAATATA 11280 

CTTATTCCCA CCTCGGATAT AATCTCAAAA TCACGGACAT GCAGGCAGCA TGTGGTCTGG 11340 

CTCAGTTGGA GCGCGTAGAA GAGTTTGTAG AGCAGCGTAA AGCTAACTTT TCCTATCTGA 11400 

AACAGGGCTT GCAATCTTGC ACTGAATTCC TCGAATTACC AGAAGCAACA GAGAAATCAG 11460 

ATCCATCCTG GTTTGGCTTC CCTATCACCC TGAAAGAAAC TAGCGGTGTT AACCGTGTCG 11520 

AACTGGTGAA ATTCCTTGAT GAAGCAAAAA TCGGTACACG TTTACTGTTT GCTGGAAATC 11580 

TGATTCGCCA ACCGTATTTT GCTAATGTGA AATATCGTGT AGTGGGTGAG TTGACAAATA 11640 

CCGACCGTAT AATGAATCAA ACGTTCTGGA TTGGTATTTA TCCAGGCTTG ACTACAGAGC 11700 

ATTTAGATTA TGTAGTTAGC AAGTTTGAAG AGTTCTTTGG TTTGAATTTC TAATTCAATT 11760 

TATTCTATCT GGTGATTGCG ATGACCTTTT TGAAAGAATA TGTAATTGTC AGTGGGGCTT 11820 

CCGGCTTTAT TGGTAAGCAT TTACTCGAAG CGCTAAAAAA ATCGGGGATT TCAGTTGTCG 11880 

CAATCACTCG AGATGTAATA AAAAATAATA GTAATGCATT AGCTAATGTT AGATGGTGCA 11940 

GTTGGGATAA TATCGAATTA TTAGTCGAGG AGTTATCAAT TGATTCTGCA TTAATTGGTA 12000 

TCATTCATTT GGCAACAGAA TATGGGCATA AAACATCATC TCTCATAAAT ATTGAAGATG 12060 

CAAATGTTAT AAAACCATTA AAGCTTCTTG ATTTGGCAAT AAAATATCGG GCGGATATCT 12120 
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TTTTAAATAC AGATAGTTTT TTTGCCAAGA AAGATTTTAA TTATCAACAT ATQCGGCCTT 12180 

ATATAATTAC TAAAAGACAC TTTGATGAAA TTGGGCATTA TTATGCTAAT ATGCATGACA 1224 0 

TTTCATTTGT AAACATGCGA TTAGAGCATG TATATGGGCC TGGGGATGGT GAAAATAAAT 12300 

TTATTCCATA CATTATCGAC TGCTTAAATA AAAAACAGAG TTGCGTGAAA TGTACAACAG 12360 

GCGAACAGAT AAGAGACTTT ATTTTTGTAG ATGATGTGGT AAATGCTTAT TTAACTATAT 12420 

TAGAAAATAG AAAAGAAGTA CCTfrCATA'TA CTGAGTATCA AGTTGGAACT GGTGCTGGGG 12480 

TAAGTTTGAA AGATTTTCTG GTTTATTTGC AAAATACTAT GATGCCAGGT TCATCGAGTA 12540 

TATTTGAATT TGGTGCGATA GAGCAAAGAG ATAATGAAAT AATGTTCTCT GTAGCAAATA 12600 

ATAAAAATTT AAAAGCAATG GGCTGGAAAC CAAATTTCGA TTATAAAAAA GGAATTGAAG 12660 

AACTACTGAA ACGGTTATGA GATTTTCATG ATCTTTTAAT AAATAAATCG TTAACAAATT 12720 

AGTCGCGTTA TGTTGTAAAA ACTAAGTCGT TTAATTGCAT AGTGAAAGTT CAATTGTTAA 12780 

AAATTCCGAG TCATTTAATT GTTGCAGGtTT CATCATGGTT ATCCAAAATA ATAATTGCCG 12840 

GGGTGCAGTT AGCAAGTATT TCATA^dTTA tTTCTATGCT AGGTGAAGAG AAATATGCAA 12900 

TCTTTAGTTT GTTAACTGGT TTATTAGTAT GGTGTAGCGC TGTTGATTTT GGCATAGGTA 12960 

CAGGACTGCA AAATTATATA tfCAGi&tfGi& ' <&GCCAAAAA CAAAAGTTAT GATGCATATA 13020 

TTAAATCAGC ATTACATCTA AGCTTTATAG CTATTATTTT TTTTATTGCT TTATTTTATA 13080 

TTTTTTCTGG GGTAATTTCC GCTAAATATC TTTCTTCTTT TCATGAGGTA TTACAGGACA 13140 

AAACCAGAAT GCTCTTTTTT ACCTCATGTC TGGTTTTCAG TTCTATTGGA ATCGGAGCTA 13200 

TTGCTTATAA AATACTTTTT GCCGAATTGG TCGGGTGGAA AGCTAATCTA TTAAACGCAT 13260 

TATCTTATAT GATAGGTATG CTCGGCTTGC TATATATATA CTATAGGGGG ATCTCAGTTG 13320 

ACATAAAATT ATCACTAATA GTCCTGTATC TTCCAGTGGG TATGATTTCA TTGTGCTATA 13380 

TTGTATATAG ATACATAAAG CTOTATCATG TTAAAACAAC AAAATCTCAT TATATAGCAA 13440 

TTTTACGTAG ATCTTCAGGG TTTTTTCTTT TTACTTTATT ATCGATAGTG GTGCTTCAAA 13500 
CAGATTATAT GGTCATTTCT CAAAGGCTAA CTCCTGCTGA TATTGTTCAA TATACAGTAA 13560 
CGATGAAAAT TTTTGGTTTA GTCTTTTTTTA TTTATACTGC TATTTTGCAA GCATTATGGC 13620 
CTATATGTGC TGAATTGAGA GTCAAACAGC AATGGAAAAA ACTTAACAAA ATGATAGGTG 13680 
TCAATATTTT GCTTGGCTCA CTATATGTfcG TTGGATGTAC AATATTTATT TATTTATTTA 13740 
AAGAACAGAT ATTTTCAGTA ATAGCCAAAG ATATTAATTA TCAAGTTTCT ATTTTATCTT 13800 
TTATGTTAAT TGG CAT ATAT TTCTGTATTC GCGTTTGGTG TGACACTTAT GCAATGTTAT 13860 
TGCAAAGTAT GAATTATTTA AAAATACTTT GGATATTAGT ACCACTACAA GCAATAATTG 13920 
GTGGAATAGC ACAATGGTAT Tll'lVl'AG'tA CGCTTGGAAT CAGTGGAGTG CTGCTTGGCT 13 980 
TGATTATATC TTTTGCTTTA ACTGTTTTTT GGGGGCTTCC ACTAACTTAC TTAATTAAGG 14 040 
CAAATAAGGG ATAATCATAT GCTTATATCA TTTTGTATTC CAACTTATAA TAGAAAACAA 141O0 
TATCTTGAAG AGTTGTTGAA TAGTATAAAT AATCAGGAAA AATTTAATTT AGATATTGAG 14160 
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ATATGTATAT CAGATAATGC CTCTACTGAT GGTACAGAGG AAATGATTGA TGTTTGGAGG 14220 

AACAATTATA ATTTCCCAAT AATATATCGG CGTAATAGCG TTAACCTTGG GCCAGATAGG 14280 

AATTTTCTTG CTTCAGTATC CCTTGCGAAT GGGGATTATT GTTGGATATT TGGCAGTGAT 1434 0 

GATGCTCTTG CGAAAGACTC GTTAGCGATA TTACAAACTT ATCTCGATTC TCAAGCAGAT 14400 

ATATATTTAT GTGACAGAAA AGAGACCGGG TGTGATTTAG TTGAGATTAG AAACCCTCAT 14460 

CGTTCTTGGC TCAGAACAGA TGATGAACTT TATGTGTTTA ATAATAATTT AGATAGGGAA 14520 

ATCTATCTCA GTAGATGCTT ATCTATTGGT GGTGTATTTA GCTATCTAAG TTCTTTAATA 14580 

GTAAAAAAAG AACGATGGGA TGCCATTGAT TTTGATGCGT CCTATATTGG CACTTCCTAT 1464 0 

CCTCATGTAT TTATCATGAT GAGCGTATTT AATACGCCAG GGTGCCTTTT GCATTATATA 14700 

TCAAAACCAC TCGTAATATG CCGAGGAGAT AATGATAGTT TCGAGAAGAA AGGAAAGGCC 14760 

AGACGAATTT TAATTGATTT TATTGCATAT TTAAAATTAG CTAATGATTT TTACAGTAAA 14820 

AATATATCTT TAAAACGAGC ATTTGAAAAT GTTTTGCTAA AAGAGAGACC ATGGTTATAT 14880 

ACAACTTTGG CTATGGCATG TTATGGCAAT AGTGATGAAA AAAGAGATTT ATCTGAATTT 1494 0 

TATGCAAAGC TAGGTTGTAA TAAAAATATG ATCAACACTG TACTTCGATT TGGGAAACTA 15000 

GCATATGCAG TGAAAAATAT TACCGTGCTT AAGAATTTTA CTAAACGGAT AATTAAGTAG 15060 

TAGTAAGTTA TTATATTGAG ATTAAATGTA GATTTAACCT TTCTGGATTC AGCTAGATTT 15120 

ACGTTACTGA CTTTTCTTTT TAATGAAAAT CATATTTGAT ATATATAAAT AAATTTGGAT 15180 

AGCTTAACTA CTTAGATGTT TTTTTCTGGG AATGTTAGTA TAATAATATA TTTCTTTATG 15240 

ATTGTTTTTG TAGTGTTTTA CTGCCGGTAT TACATTAACT CTATTATTAA GAATTACACC 153 00 

TAGTGTAAGC TTCGTAATAT TATTTATCCT TATGATTATT GCTTTAAAGA TGCGTATGGA 15360 
AAAACGGAGA GCTATTCAAT GATCGTAAAC CTATCACGTT TAGGTAAAAG TGGTACGGGA 15420 
ATGTGGCAAT ACTCGATTAA ATTTTTAACG GCACTGCGAG AAATAGCTGA TGTTGACGCA 15480 
ATAATCTGTA GCAAGGTACA CGCTGATTAT TTTGAAAAGC TCGGTTATGC AGTAGTTACT 15540 
GTTCCGAATA TTGTTAGCAA CACATCAAAA ACATCGCGAC TTAGACCATT AGTATGGTAT 15600 
GTATATAGTT ACTGGCTTGC GCTGAGGGTT TTAATTAAGT TTGGTAATAA AAAATTGGTG 15660 
TGTACTACAC ATCACACTAT CCCCTTACTG AGAAACCAAA CGATAACCGT ACATGATATA 15720 
AGACCTTTTT ATTATCCAGA TAGTTTTATT CAGAAAGTGT ATTTTCGCTT TTTATTAAAA 15780 
ATGTCCGTTA AGCGATGTAA GCATGTTTTA ACGGTATCTT ATACCGTTAA AGATAGCATT 15840 
GCTAAAACTT ATAATGTAGA TAGTGAGAAA ATATCAGTAA TTTATAATAG TGTTAATAAA 15900 
TCTGATTTTA TACAAAAAAA AGAAAAAGAG AATTACTTTT TAGCTGTTGG TGCAAGTTGG 15960 
CCACATAAAA ATATTCATTC ATTCATAAAA AATAAAAAAG TTTGGTCTGA CTCTTATAAT 16020 
TTAATTATTG TATGTGGTCG TACTGACTAT GCAATGTCTC TCCAACAAAT GGTCGTTGAT 16080 
CTGGAACTAA AAGATAAAGT GACTTTTTTA CATGAAGTCT CATTTAATGA ATTAAAGATT 16140 
TTATATTCTA AAGCCTACGC GCTTGTTTAT CCATCTATTG ATGAGGGTTT TGGTATACCT 16200 
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CCTATTGAAG CGATGGCATC AAATACTCCA GTTATAGTGT CCGATATACC AGTATTTCAT 1626 0 

GAAGTGTTAA CCAATGGTGC ATTATATGTG AATCCGGATG ATGAAAAAAG CTGGCAGAGT 16320 

GCAATTAAAA ATATAGAGCA GTTGCCTGAT GCAATTTCCC GATTTAACAA CTATGTCGCA 1638 0 

CGGTATGACT TTGATAATAT GAAGCAGATG GTTGGCAATT GGTTGGCGGA ATCAAAATAA 16440 

ATGAAAATAA CATTAATTAT TCCCACATAT AATGCAGGGT CGCTTTGGCC TAATGTTCTG 16500 

GATGCGATTA AGCAGCAAAC TATATATCCG GATAAATTGA TTGTTATAGA CTCAGGTTCT 16560 

AAAGATGAAA CGGTTCCGTT AGCOTCAGAC CTGAAAAATA TATCAATATT TAATATTGAC 16620 

TCTAAAGATT TTAATCATGG AGGAACCAGA AATTTAGCAG TTGCAAAAAC TCTGGACGCT 16680 

GATGTTATAA TTTTTCTAAC GCAAGATGCA ATTCTCGCGG ATTCGGATGC AATTAAAAAT 16740 

TTGGTTTATT ATTTTTCAGA TCCATTGATA GCAGCGGTTT GTGGTAGACA ACTTCCTCAT 16800 

AAAGATGCTA ATCCCCTTGC AGTGCATGCC AGAAATTTTA ATTATAGTT C AAAATCTATT 16860 

GTTAAAAGTA AGGCAGATAT AGAAAAATTG GGTATTAAAA CTGTATTTAT GTCCAATTCT 16920 

TTTGCTGCCT ATCGCCGTTC CGTTTTTGAA GAGTTAAGTG GGTTTCCTGA ACATACAATT 16980 

CTTGCCGAGG ATATGTTTAT GGCGGCTAAG ATGATTCAGG CGGGTTATAA GGTCGCCTAC 17040 

TGCGCTGAAG CGGTGGTAAG ACACTCCCAT AATTATACCC CGCGAGAAGA GTTTCAACGA 17100 

TATTTTGATA CTGGTGTATT TCATGCTTGT TCTCCGTGGA TTCAGCGTGA CTTTGGCGGA 17160 

GCCGGTGGTG AGGGTTTCCG CTTCGTAAAA TCAGAGATTC AATTCCTGCT TAAAAATGCA 17220 

CCGTTCTGGA TTCCAAGAGC TT TATTAACA ACCTTTGCTA AATTCTTGGG TTACAAATTA 17280 

GGCAAG CATT GGCAATCTTT ACCGTTGTCT ACATGTCGCT ATTTTAGCAT GTACAAGAGT 1734 0 

TATTGGAATA ATATCCAATA TTCTTCGTCA AAAGAGATAA AATAAATGTC TTTTCTTCCC 17400 

GTAATTATGG CTGGCGGCAC AGGTAGCCGT TTATGGCCGC TTTCACGCGA ATATCATCCG 17460 

AAGCAGTTTC TAAGCGTTGA AGGTAAACTA TCAATGCTGC AAAATACTAT AAAGCGATTA 17520 

GCTTCACTTT CTACAGAAGA ACCCGTTGTC ATTTGCAATG ACAGACACCG TTTCTTAGTC 17580 

GCTGAACAAC TCCGTGAAAT TGACAAGTTA GCAAATAATA TTATTCTCGA ACCGGTAGGC 17640 

CGTAATACTG CACCAGCGAT CGCTCTTGCC GCGTTTTGTG CGCTCCAGAA TGCTGATAAT 17700 

GCTGATCCTC TTTTGTTGGT TCTTGCTGCA GATCATGTGA TTCAGGATGA AATAGCTTTT 17760 

ACGAAAGCTG TCAGACATGC TGAAGAATAC GCTGCAAATG GTAAGCTTGT AACTTTTGGT 17820 

ATTGTTCCAA CGCATGCTGA AACGGGTTAT GGATATATTC GTCGTGGTGA GTTGATAGGA 17880 

AATGACGCTT ATGCAGTGGC TGAATTTGTG GAGAAACCGG ATATCGATAC CGCCGGTGAC 17940 

TATTTCAAAT CAGGGAAATA TTACTGGAAT AGCGGTATGT TTTTATTTCG TGCAAGCTCT 18000 

TATTTAAACG AATTAAAGTA TTTATCACCT GAAATTTATA AAGCTTGTGA AAAGGCGGTA 18060 

GGACATATAA ATCCCGATCT TGATTTTATT CGTATTGATA AAGAAGAGTT TATGTCATGC 18120 

CCGAGTGATT CTATCGATTA TGCAGTTATG GAGCACACAC AGCATGCGGT GGTGATACCA 18180 

ATGAGCGCTG GCTGGTCGGA TGTGGGTTCC TGGTCCTCAC TTTGGGATAT ATCGAATAAA 18240 
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GATCATCAGA GAAATGTTTT AAAAGGAGAT ATTTTCGCAC ATGCTTGTAA TGATAATTAC 18300 

ATTTATTCCG AAGATATGTT TATAAGTGCG ATTGGTGTAA GCAATCTTGT CAT TGTTCAA 18360 

ACAACAGACG CTTTACTGGT GGCTAATAAA GATACAGTAC AAGATGTTAA AAAAATTGTC 18420 

GATTATTTAA AACGGAATGA TAGGAACGAA TATAAACAAC ATCAAGAAGT TTTCCGCCCC 18480 

TGGGGAAAAT ATAATGTGAT TGATAGCGGC AAAAATTACC TCGTTCGATG TATCACTGTT 18540 

AAGCCGGGTG AGAAATTTGT GGCGCAGATG CATCACCACC GGGCTGAGCA TTGGATAGTA 18600 

TTATCCGGGA CTGCTCGTGT TACAAAGGGA GAGCAGACTT ATATGGTTTC TGAAAATGAA 18660 

TCAACATTTA TTCCTCCGAA TACTATTCAC GCGCTGGAAA ATCCTGGAAT GACCCCCCTG 18720 

AAGTTAATTG AGATTCAATC AGGTACCTAT CTTGGTGAGG ATGATATTAT TCGTTTAGAA 18780 

CAACGTTCTG GATTTTCGAA GGAGTGGACT AATGAACGTA GTTAATAATA GCCGTGATGT 18840 

TATTTATTCA TCAGGTATTG TGTTTGGAAC GAGTGGGGCT CGCGGTCTTG TAAAAGATTT 18900 

TACACCTCAG GTATGTGCTG CTTTTACGGT TTCATTTGTT GCCGTTATGC AGGAACATTT 18960 

TTCCTTTGAT ACCGTAGCAT TGGCAATAGA TAATCGTCCA AGTAGTTATG GGATGGCTCA 19020 

GGCGTGTGCT GCTGCATTGG CGGATAAAGG CGTTAACTGT ATTTTTTATG GAGTGGTACC 19080 

AACCCCAGCT TTGGCCTTTC AGTCTATGTC TGACAATATG CCTGCGATAA TGGTTACGGG 19140 

AAGTCATATT CCATTCGAGC GGAACGGCCT CAAGTTTTAT CGTCCTGATG GTGAAATCAC 19200 

GAAACATGAT GAGGCTGCGA TCCTTAGTGT TGAAGATACG TGCAGCCATT TAGAGCTTAA 19260 

AGAACTCATA GTTTCAGAAA TGGCTGCTGT TAATTATATA TCTCGTTATA CATCTTTATT 19320 
TTCTACTCCA TTCCTGAAAA ATAAGCGTAT TGGTATTTAC GAACATTCAA GCGCTGGGCG 19380 
TGATCTTTAT AAGCCTTTAT TTATTGCATT GGGGGCTGAA GTCGTTAGCT TGGGTAGAAG 19440 
CGATAATTTT GTACCTATAG ATACAGAGGC TGTAAGCAAA GAGGATCGGG AAAAAGCTCG 19500 
CTCATGGGCT AAAGAGTTCG ATTTAGATGC CATATTCTCG ACAGATGGGG ATGGTGATCG 19560 
CCCTCTTATT GCTGATGAGG CCGGTGAGTG GCTAAGAGGC GATATACTAG GTCTATTATG 19620 
TTCACTTGCA TTGGATGCAG AAGCCGTCGC TATTCCTGTT AGTTGTAACA GCATAATTTC 19680 
TTCTGGCCGC TTTTTTAAAC ATGTTAAGCT TACAAAAATT GGCTCGCCTT ATGTTATCGA 19740 
AGCTTTTAAT GAATTATCGC GGAGTTATAG TGGTATTGTC GGTTTTGAAG CCAATGGCGG 19800 
TTTTTTATTA GGAAGCGACA TCTGTATTAA CGAGCAGAAT CTTCATGCCT TACCAACTCG 19860 
TGATGCTGTA TTACCAGCAA TAATGCTGCT TTACAAAAGT AGGAATACCA GCATTAGCGC 19920 
TTTAGTCAAT GAACTCCCAA CTCGTTACAC CCATTCTGAC AGATTACAGG GGATTACAAC 19980 
TGATAAAAGT CAATCCTTAA TTAGTATGGG CAGAGAAAAT CTGAGCAACC TCTTAAGCTA 20040 
TATTGGTTTG GAGAATGAAG GTGCAATTTC TACAGATATG ACAGATGGTA TGCGAATTAC 20100 
TTTACGTGAT GGATGTATTG TGCATTTGCG GGCTTCTGGT AATGCACCTG AGTTACGCTG 2016 0 
CTATGCAGAA GCTAATTTAT TAAATAGGGC TCAGGATCTT GTAAATACAA CGCTTGCTAA 20220 
TATTAAAAAA CGATGCTTGC TGTAAAAAAA TTGAATGTTA TTTACTTAAT ATGCCTATTT 20280 
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TATTTACATT ATGCACGGTC AGAGGGTGAG GATTAAATGG ATAATATTGA TAATAAGTAT 2034 0 

AATCCACAGC TATGTAAAAT TTTTTTGGCT ATATCGGATT TGATTTTTTT TAATTTAGCC 2 0400 

TTATGGTTTT CATTAGGATG TGTCTATTTT ATTTTTGATC AAGTACAGCG ATTTATTCCT 20460 

CAAGACCAAT TAGATACAAG AGTTATTACG CATTTTATTT TGTCAGTAGT ATGTGTCGGT 20520 

TGGTTTTGGA TTCGTTTGCG ACATTATACT ATCCGCAAGC CATTTTGGTA TGAGTTAAAA 20580 

GAAATTTTTC GTACGATCGT TATTTTTGCT ATATTTGATT TGGCTCTGAT AGCGTTTACA 20640 

AAATGGCAGT TTTCACGCTA TGTCTGGGTG TTTTGTTGGA CTTTTGCCCT AATC CTGGTG 20700 

CCTTTTTTTC GCGCACTTAC AAAGCATTTA TTGAACAAGC TAGGTATCTG GAAGAAAAAA 20760 

ACTATCATCC TGGGGAGCGG ACAGAATGCT CGTGGTGCAT ATTCTGCGCT GCAAAGTGAG 20820 

GAGATGATGG GGTTTGATGT TATCGCTTTT TTTGATACGG ATGCGTCAGA TGCTGAAATA 20880 

AATATGTTGC CGGTGATAAA GGATACTGAG ATTATTTGGG ATTTAAATCG TACAGGTGAT 20940 

GTCCATTATA TCCTTGCTTA TGAATACACC GAGTTGGAGA AAACACATTT TTGGCTACGT 21000 

GAACTTTCAA AACATCATTG TCGTTCTGTT ACTGTAGTCC CCTCGTTTAG AGGATTGCCA 21060 

TTATATAATA CTGATATGTC TTTTATCTTT AGCCATGAAG TTATGTTATT AAGGATACAA 21120 

AATAACTTGG CTAAAAGGTC GTCCCGTTTT CTCAAACGGA CATTTGATAT TGTTTGTTCA 21180 

ATAATGATTC TTATAATTGC ATCACCACTT ATGATTTATC TGTGGTATAA AGTTACTCGA 21240 

GATGGTGGTC CGGCTATTTA TGGTCACCAG CGAGTAGGTC GGCATGGAAA ACTTTTTCCA 21300 

TGCTACAAAT TTCGTTCTAT GGTTATGAAT TCTCAAGAGG TACTAAAAGA ACTTTTGGCT 21360 

AACGATCCTA TTGCCAGGGC TGAATGGGAG AAAGATTTTA AACTGAAAAA TGATCCTCGA 2142 0 

ATCACAGCTG TAGGTCGATT TATACGTAAA ACTAGCCTTG ATGAGTTGCC ACAACTTTTT 21480 

AATGTACTAA AAGGTGATAT GAGCCTGGTT GGACCACGAC CTATCGTTTC GGATGAACTG 2154 0 

GAGCGTTATT GTGATGATGT TGATTATTAT TTGATGGCAA AGCCGGGCAT GACAGGTCTA 21600 

TGGCAAGTGA GTGGGCGTAA TGATGTTGAT TATGACACTC GTGTTTATTT TGATTCCTGG 21660 

TATGTTAAAA ACTGGACGCT TTGGAATGAT ATTGCCATTC TGTTTAAAAC AGCGAAAGTT 21720 

GTTTTGCGGC GAGATGGTGC GTATTAAGCT TACCGAGAAG TACTGAATAA TAATTGTATA 21780 

AATTAGCCTG CGTAAAATCT GAACGCATCA ATCGCTACCT TAATATCATA CCTTTGAGTT 21840 

AACATACTAT TCACCTTTAA CCTGCCATGA CCGTTTGTGG CAGGGTTTCC ACACCTGACA 21900 

GGAGTATGTA ATGTCCAAGC AACAGATCGG CGTCGTCGGT ATGGCAGTGA TGGGGCGCAA 21960 

CCTCGCGCTC AACATCGAAA GCCGTGGTTA TACCGTCTCC GTTTTCAACC GCTCCCGTGA 22020 
AAAGACCGAA GAAGTGATTG CCGAGAATCC CGGCAAAAAG CTGGTGCCTT ATTACACGGT 22080 
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THE CLAIMS: 

1. A nucleic acid molecule derived from: a gene 
encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 

5 oligosaccharide unit, including a wzx gene or a- wzy gene, 
or a gene with a similar function; the gene being involved 
in the synthesis of a particular bacterial polysaccharide 
antigen, wherein the sequence of the nucleic acid molecule 
is specific to the particular bacterial polysaccharide 
10 antigen. 

2. A nucleic acid molecule derived from: a gene 
encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 

15 oligosaccharide unit such as a wzx or wzy gene; the gene 
being involved in the synthesis of a particular bacterial 
O antigen, wherein the sequence of the nucleic acid 
molecule is specific to the particular bacterial O 
antigen. 

20 

3. A nucleic acid molecule derived from: a gene 
encoding a transferase; or a gene encoding an enzyme for 
the transport or processing of a polysaccharide or 
oligosaccharide unit such as a wzx or wzy gene; the gene 

25 being involved in the synthesis of an O antigen expressed 
by E. coJLi, wherein the sequence of the nucleic acid 
molecule is specific to the O antigen. 

4 . A nucleic acid molecule derived from a gene 
30 encoding a transferase; or a gene encoding an enzyme for 

the transport or processing of a polysaccharide or 
oligosaccharide unit such as a wzx or wzy gene; the gene 
being involved in the synthesis of an O antigen expressed 
by S. enterica , wherein the sequence of the nucleic acid 
35 molecule is specific to the O antigen. 

5. A nucleic acid molecule according to any one 
of claims 1 to 4 wherein the nucleic acid molecule is 
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approximately 10 to 20 nucleotides in length. 

6. A nucleic acid molecule derived from a gene, 

the gene being selected from a group consisting of the 
following sequences : 

nucleotide position 739 to 1932 of SEQ ID N0:lr 
nucleotide position 8646 to 9911 of SEQ ID NO:l; 
nucleotide position 9901 to 10953 of SEQ ID NO-.l; 
nucleotide position 11821 to 12945 of SEQ ID NO:l; 
nucleotide position 79 to 861 of SEQ ID NO: 2; 
nucleotide position 858 to 2042 of SEQ ID NO: 2; 
nucleotide position 2011 to 2757 of SEQ ID NO : 2 ; 
nucleotide position 2744 to 4135 of SEQ ID NO: 2; 
nucleotide position 5257 to 6471 of SEQ ID NO: 2; and 
nucleotide position 13156 to 13821 of SEQ ID NO: 2; 
which nucleic acid molecule is capable of hybridizing to 
complementary sequence from said gene. 



7. A nucleic acid molecule which is any one of 

20 the oligonucleotides in Table 5 or 5A, with respect to the 
genes wbdH, wzx, wzy and wbdM. 



8 . A nucleic acid molecule which is any one of 

the oligonucleotides in Table 6 or 6A. 

25 

9. A nucleic acid molecule derived from a gene, 

the gene being selected from a group consisting of the 
foil owing s equences : 

nucleotide position 1019 to 2359 of SEQ ID NO: 3; 
30 nucleotide position 2352 to 3314 of SEQ ID NO: 3; 

nucleotide position 3361 to 3875 of SEQ ID NO:3; 

nucleotide position 3977 to 5020 of SEQ ID NO: 3; 

nucleotide position 5114 to 6313 of SEQ ID NO: 3; 

nucleotide position 6313 to 7323 of SEQ ID NO:3; 
35 nucleotide position 7310 to 8467 of SEQ ID NO: 3; 

nucleotide position 12762 to 14054 of SEQ ID NO: 4; and 

nucleotide position 14059 to 15060 of SEQ ID NO: 4; 

which nucleic acid molecule is capable of hybridizing to 
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complementary sequences from said gene. 

10. A nucleic acid molecule which is any one of 
the oligonucleotides in Table 7 . 

11. A nucleic acid molecule which is any one of 
the oligonucleotides in Table 8 with respect to the genes 
wzx and wbaV. 



10 12 . A method of testing a sample for the presence 

of one or more bacterial polysaccharide antigens, the 
method comprising the following steps: 

(a) contacting the sample with at least one 
oligonucleotide molecule capable of specifically 

15 hybridising to: (i) a gene encoding a transferase, or (ii) 
a gene encoding an enzyme for transport or processing of 
oligosaccharide or polysaccharide units, including a wzx 
or wzy gene; wherein said gene is involved in the 
synthesis of the bacterial polysaccharide antigen; under 

20 conditions suitable to permit the at least one 

oligonucleotide molecule to specifically hybridise to at 
least one such gene of any bacteria expressing the 
bacterial polysaccharide antigen present in the sample and 

(b) detecting any specifically hybridised oligonucleotide 
25 molecules. 



13. The method according to claim 12, the method 
further comprising contacting the sample with a further at 
least one oligonucleotide molecule capable of specifically 

30 hybridising to at least one sugar pathway gene under 

conditions suitable to permit the further at least one 
oligonucleotide molecule to specifically hybridise to at 
least one such sugar pathway gene of any bacteria 
expressing the bacterial polysaccharide antigen present in 

35 the sample and detecting any specifically hybridised 
oligonucleotide molecules . 

14 . A method of testing a sample for the presence 
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of one or more bacterial polysaccharide antigens, the 
method comprising the following steps: 

(a) contacting the sample with at least one pair of 
oligonucleotide molecules, with at least one 

5 oligonucleotide molecule of the pair capable of. 

specifically hybridising to: (i) a gene encoding a 
transferase, or (ii) a gene encoding an enzyme for 
transport or processing of oligosaccharide or 
polysaccharide units, including a wzx or wzy gene; wherein 

10 the gene is involved in the synthesis of the bacterial 
polysaccharide antigen; under conditions suitable to 
permit the at least one oligonucleotide molecule of the 
pair of molecules to specifically hybridise to at least 
such gene of any bacteria expressing the bacterial 

15 polysaccharide antigen present in the sample and 

(b) detecting any specifically hybridised oligonucleotide 
molecules . 



15. The method according to claim 14, the method 
20 further comprising contacting the sample with a further at 

least one pair of oligonucleotide molecules, with at least 
one oligonucleotide molecule of the pair capable of 
specifically hybridising to at least one sugar pathway 
gene under conditions suitable to permit the further at 
25 least one oligonucleotide molecule of the pair to 

specifically hybridise to at least one such sugar pathway 
gene of any bacteria expressing the bacterial 
polysaccharide antigen present in the sample and detecting 
any specifically hybridised oligonucleotide molecules. 

30 

16. A method of testing a sample for the presence 
of one or more bacterial O antigens, the method comprising 
the following steps: 

(a) contacting the sample with at least one 
3 5 oligonucleotide molecule capable of specifically 
hybridising to: (i) a gene encoding an O antigen 
transferase, or (ii) a gene encoding an enzyme for 
transport or processing of the oligosaccharide or 
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polysaccharide units, including a wzx or wzy gene; wherein 
said gene is involved in the synthesis of the bacterial O 
antigen; under conditions suitable to permit the at least 
one oligonucleotide molecule to specifically hybridise to 
5 at least one such gene of any bacteria expressing the 
bacterial 0 antigen present in the sample and 
(b) detecting any specifically hybridised oligonucleotide 
molecules . 

10 17. The method according to claim 16, the method 

further comprising contacting the sample with a further at 
least one oligonucleotide molecule capable of specifically 
hybridising to at least one sugar pathway gene under 
conditions suitable to permit the further at least one 

15 oligonucleotide molecule to specifically hybridise to at 
least one such sugar pathway gene of any bacteria 
expressing the bacterial 0 antigen present in the sample 
and detecting any specifically hybridised oligonucleotide 
molecules . 

20 

18. The method according to claim 16 or 17 wherein 
the 0 antigen is expressed by E_=_ coli or £L. ent erica . 

19. The method according to claim 18 wherein the 
25 E_s. coli express the 0157 O antigen serotype or the 0111 O 

antigen serotype. 

20. The method according to claim 18 wherein the 
S^. enterica express the C2 or B O antigen serotype. 

30 

21. The method according to any one of claims 16 
to 20 wherein the specifically hybridised oligonucleotide 
molecules are detected by Southern blot analysis . 

35 22. A method of testing a sample for the presence 

of one or more bacterial O antigens, the method comprising 
the following steps: 
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(a) contacting the sample with at least one pair of 
oligonucleotide molecules, with at least one 
oligonucleotide molecule of the pair being capable of 
specifically hybridising to: (i) a gene encoding an O 

5 antigen transferase, or (ii) a gene encoding an enzyme for 
transport or processing of oligosaccharide or 
polysaccharide units, including a wzx or wzy gene; wherein 
the gene is involved in the synthesis of the bacterial 0 
antigen; under conditions suitable to permit the at least 
10 one oligonucleotide molecule of the pair of molecules to 
specifically hybridise to at least one such gene of any 
bacteria expressing the bacterial 0 antigen present in the 
sample and 

(b) detecting any specifically hybridised oligonucleotide 
15 molecules. 

23. The method according to claim 22, the method 
further comprising contacting the sample with a further at 
least one pair of oligonucleotide molecules, with at least 

20 one oligonucleotide molecule of the pair capable of 

specifically hybridising to at least one sugar pathway 
gene under conditions suitable to permit the further at 
least one oligonucleotide molecule of the pair to 
specifically hybridise to at least one such sugar pathway 

25 gene of any bacteria expressing the bacterial O antigen 
present in the sample and detecting any specifically 
hybridised oligonucleotide molecules. 

24 . The method according to claim 22 or 23 wherein 
3 0 the O antigen is expressed by IL_ coli or SL. enterica . 

25 . The method according to claim 24 wherein the 
coli are 0111 or the 0157 O antigen serotype. 

3 5 26. The method according to claim 24 wherein the 

£Ll. enterica express the C2 or B O antigen serotype. 
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27. The method according to any one of claims 22 
to 26 wherein the method is performed according to the 
polymerase chain reaction method. 

28. The method according to any one of claims 22 
to 26 wherein the oligonucleotide molecules are selected 
from the group of nucleic acid molecules according to any 
one of claims 5 to 11 . 

29 . A method for testing a food derived sample for 
the presence of one or more particular bacterial 0 
antigens, the method being according to any one of claims 
16 to 28. 

30. A method for testing a faecal derived sample for 
the presence of one or more particular bacterial 0 
antigens, the method being according to any one of claims 
16 to 28. 

31. A method for testing a sample derived from a 
patient for the presence of one or more particular 
bacterial 0 antigens, the method being according to any 
one of claims 16 to 28. 

32. A kit comprising a first vial containing a first 
nucleic acid molecule capable of specifically hybridising 
to: (i) a gene encoding a transferase, or (ii) a gene 
encoding an enzyme for transport or processing 
oligosaccharide or polysaccharide units, including a wzx 
or wzy gene, wherein said gene is involved in the 
synthesis of a bacterial polysaccharide. 

33. The kit according to claim 32 further comprising 
in the first vial, or in a second vial, a second nucleic 
acid molecule capable of specifically hybridising to: (i) 
a gene encoding a transferase, or (ii) a gene encoding an 
enzyme for transport or processing oligosaccharide or 
polysaccharide units, including a wzx or wzy gene, wherein 
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said gene is involved in the synthesis of a bacterial 
polysaccharide, and wherein the sequence of the second 
nucleic acid molecule is different from the sequence of 
the first nucleic acid molecule. 
5 34 . The kit according to claim 33 further comprising 

a nucleic acid molecule derived from a sugar pathway gene. 

35. A kit according to claim 32 further comprising 
in the first vial, or in a second vial, a second nucleic 

10 acid molecule capable of specifically hybridising to a 
sugar pathway gene. 

36. A kit according to any one of claims 32 to 35 
wherein the nucleic acid molecules are approximately 10 to 

15 20 nucleotides in length. 

37 . a kit comprising a first vial containing a 
first nucleic acid molecule capable of specifically 
hybridising to: (i) a gene encoding a transferase, or <ii) 

20 a gene encoding an enzyme for transport or processing 

oligosaccharide or polysaccharide units, including a wzx 
or wzy gene, wherein said gene is involved in the 
synthesis of a bacterial O antigen. 

25 38 • Tne kit according to claim 37, further 

comprising in the first vial, or in a second vial, a 
second nucleic acid molecule capable of specifically 
hybridising to: (i) a gene encoding a transferase, or (ii) 
a gene encoding an enzyme for transport or processing 
oligosaccharide or polysaccharide units, including a wzx 
or wzy gene, wherein said gene is involved in the 
synthesis of a bacterial 0 antigen, and wherein the 
sequence of the second nucleic acid molecule is different 
from the sequence of the first nucleic acid molecule. 



30 



35 



39. A kit according to claim 37 further comprising 
in the first vial, or in a second vial, a second nucleic 
acid molecule capable of specifically hybridising to a 
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sugar pathway gene. 

40. The kit according to claim 38 further comprising 
a nucleic acid molecule derived from a sugar pathway gene. 

5 

41. The kit according to any one of claims 37 to 40 
wherein the nucleic acid molecules are approximately 10 to 
20 nucleotides in length. 



10 42. The kit according to any one of claims 31 to 34 

wherein the first and second nucleic acid molecules are 
according to any one of claims 5 to 11. 
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GATCTGATGGCCGTAGGGCGCTACGTGCTTTCTGCTGATATCTGGGCTGAGTTGGAAAAA 6 0 

ACTGCTCCAGGTGCCTGGGGACGTATTCAACTGACTGATGCTATTGCAGAGTTGGCTAAA 120 

AAACAGTCTGTTGATGCCATGCTGATGACCGGCGACAGCTACGACTGCGGTAAGAAGATG 180 

GGCTATATGCAGGCATTCGTTAAGTATGGGCTGCGCAACCTTAAAGAAGGGGCGAAGTTC 240 

CGTAAGAGCATCAAGAAGCTACTGAGTGAGTAGAGATTTACACGTCTTTGTGACGATAAG 300 

CCAGAAAAAATAGCGGCAGTTAACATCCAGGCTTCTATGCTTTAAGCAATGGAATGTTAC 360 

TGCCGTTTTTTATGAAAAATGACCAATAATAACAAGTTAACCTACCAAGTTTAATCTGCT 42 0 

TTTTGTTGGATTTTTTCTTGTTTCTGGTCGCATTTGGTAAGACAATTAGCGTGAGTTTTA 480 

GAGAGTTTTGCGGGATCTCGCGGAACTGCTCACATCTTTGGCATTTAGTTAGTGCACTGG 540 

TAGCTGTTAAGCCAGGGGCGGTAGCTTGCCTAATTAATTTTTAACGTATACATTTATTCT 600 

TGCCGCTTATAGCAAATAAAGTCAATCGGATTAAACTTCTTTTCCATTAGGTAAAAGAGT 660 

GTTTGTAGTCGCTCAGGGAAATTGGTTTTGGTAGTAGTACTTTTCAAATTATCCATTTTC 720 

Start of orfl 

MLLCCIHINVYYLL 
CGATTTAGATGGCAGTTG ATGTTACTATGCTGCATACATATCAATGTATATTATTTACTT 7 80 

LECDMKKIVI IGNVASMMLR 
TTAGAATGTGATATGAAAAAAATAGTGATCATAGGCAATGTAGCGTCAATGATGTTAAGG 840 

FRKELIMNLVRQGDNVYCLA 
TTCAGGAAAGAATTAATCATGAATTTAGTGAGGCAAGGTGATAATGTATATTGTCTAGCA 900 

NDFSTEDLKVLS SWGVKGVK 
AATGATTTTTCCACTGAAGATCTTAAAGTACTTTCGTCATGGGGCGTTAAGGGGGTTAAA 960 ' 

FSLNSKGINPFKDI I A V Y E L 
TTCTCTCTTAACTCAAAGGGTATTAATCCTTTTAAGGATATAATTGCTGTTTATGAACTA 1020 

KKILKDISPDIVFSYFVKPV 
AAAAAAATTCTTAAGGATATTTCCCCAGATATTGTATTTTCATATTTTGTAAAGCCAGTA 1080 

IFGTIASKLSKVPRIVGMIE 
ATATTTGGAACrATTGCTTCAAAGTTGTCAAAAGTGCCAAGGATTGTTGGAATGATTGAA 1140 

GLGNAFTYYKGKQTTKTKMI 
GGTCTAGGTAATGCCTTCACTTATTATAAGGGAAAGCAGACCACAAAAACTAAAATGATA 1200 

KWIQILLYKLALPMLDDLXL 
AAGTGGATACAAATTCTTTTATATAAGTTAGCATTACCGATGCTTGATGATTTGATTCTA 1260 

LNHDDKKDLIDQYNIKAKVT 
TTAAATCATGATGATAAAAAAGATTTAATCGATCAGTATAATATTAAAGCTAAGGTAACA 1320 

VLGGIGLDLNEFSYKEP PKE 
GTGTTAGGTGGGATTGGATTGGATCTTAATGAGTTTTCATATAAAGAGCCACCGAAAGAG 1380 

KITFIFIARLLREKGIFEFI 
AAAATTACCTTTATTTTTATAGCAAGGTTATTAAGAGAGAAAGGGATATTTGAGTTTATT 1440 

EAAKFVKTTYPSSEFVILGG 
GAAGCCGCAAAGTTCGTTAAGACAACTTATCCAAGTTCTGAATTTGTAATTTTAGGAGGT 1500 
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PESNNPPSLQKNEIESLRKE 
TTTGAGAGTAATAATCCTTTCTCATTACAAAAAAATGAAATTGAATCGCTAAGAAAAGAA 

HDLIYPGHVENVQDWLEKS S 
CATGATCTTATTTATCCTGGTCATGTGGAAAATGTTCAAGATTGGTTAGAGAAAAGTTCT 

VFVLPTSYREGVPRVIQEAM 
GTTTTTGTTTTACCTACATCATATCGAGAAGGCGTACCAAGGGTGATCCAAGAAGCTATG 

AIGRPVIT TNVPGCRDI IND 
GCTATTGGTAGACCTGTAATAACAACTAATGTACCTGGGTGTAGGGATATAATAAATGAT 



TATTTTATTGAGAATAAAGATAAAGTACTCGAAATGGGGCTTGCTGGAAGGAAGTTTGCA 

EKNFDAFEKNNRLAS I IKSN 
GAAAAAAACTTTGATGCTTTTGAAAAAAATAATAGACTAGCATCAATAATAAAATCAAAT 



1560 
1620 
1680 
1740 
1800 
1860 
1920 



End of orf 1 



AATGATTTT rGACTTGAGCAGAAATTATTTATATTTCAATCTGAAAAATAAAGGCTGTTA 1980 
Start of or£2 

MNKVAL ITGI TGQDGSYLA 
TTATSAATAAAGTGGCATTAATTACTGGTATCACTGGGCAAGATGGCTCCTATTTGGCAG 2040 

ELLLEKGYEVHGIKRRASSF 
AATTATTGTTAGAAAAAGGTTATGAAGTTCATGGTATTAAACGCCGTGCATCTTCATTTA 2100 

WTERVDHIYQDSHLANPKLF 
ATACTGAGCGAGTGGATCACATCTATCAGGATTCACATTTAGCTAATCCTAAACTTTTTC 2160 

LHYGDLTDTSNLTRILKEVQ 
TACACTATGGCGATTTGACAGATACTTCCAATCTGACCCGTATTTTAAAAGAAGTTCAAC 222 0 

PDEVYNLGAMSHVAVSFESP 
CAGATGAAGTTTACAATTTGGGGGCGATGAGCCATGTAGCGGTATCATTTGAGTCACCAG 2280 

EYTADVDAIGTLRLLEAIRI 
AATACACTGCTGATGTTGATGCGATAGGAACATTGCGTCTTCTTGAAGCTATCAGGATAT 2340 

LGLEKKTKFYQASTSELYGL 
TGGGGCTGGAAAAAAAGACAAAATTTTATCAGGCTTCAACTTCAGAGCTTTATGGTTTGG 2400 

VQEIPQKETTPFYPRSPYAV 
TTCAAGAAATTCCACAAAAAGAGACTACGCCATTTTATCCACGTTCGCCTTATGCTGTTG 2460 

AKLYAYWITVNYRESYGMFA 
CAAAATTATATGCCTATTGGATCACTGTTAATTATCGTGAGTCTTATGGTATGTTTGCCT 2520 

CNGILFNHESPRRGETFVTR 
GCAATGGTATTCTCTTTAACCACGAATCACCTCGCCGTGGCGAGACCTTTGTTACTCGTA 2580 

KITRGIANIAQGLDKCLYLG 
AAATAACACGCGGGATAGCAAATATTGCTCAAGGTCTTGATAAATGCTTATACTTGGGAA 2640 

NMDSLRDWGHAKDYVKMQWM 
ATATGGATTCTCTGCGTGATTGGGGACATGCTAAGGATTATGTCAAAATGCAATGGATGA 2700 
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MLQQ.ETPEDFVIATGIQYSV 
TGCTGCAGCAAGAAACTCCAGAAGATTTTGTAATTGCTAC AGGAATTCAATATTCTGTCC 276 0 

REFVTMAAEQVG I ELAF EG E 
GTGAGTTTGTCACAATGGCGGCAGAGCAAGTAGGCATAGAGTTAGCATTTGAAGGTGAGG 282 0 

GVNEKGVVVSVNGTDAKAVN. 
GAGTAAATGAAAAAGGTGTTGTTGTTTCGGTCAATGGCACTGATGCTAAAGCTGTAAACC 2880 

PGDVI ISVDPRYFRPAEVET 
CGGGCGATGTAATTATATCTGTAGATCCAAGGTATTTTAGGCCTGCAGAAGTTGAAACCT 2940 

LLGDPTNAHKKLGWSPEITL 
TGCTTGGCGATCCTACTAATGCGCATAAAAAATTAGGATGGAGCCCTGAAATTAC ATTGC 3000 

REMVKEMVS SDLAIAKKNVL 
GTGAAATGGTAAAAGAAATG GTTTGCAGCGATTTAGCAATAGCGAAAAAGAACGTCTTGC 3060 

End of orf 2 

LKANNIATNIPQE* 

TGAAAGCTAATAAOAlTOCCACTAATATT'CCGCAAGAArAaAAAAGATAATACATTAAAT 3120 

Start of orf 3 

M F 

AATTAAAAATGGTGCTAGATTTATOAGTACCATTATTTTT?TTTTGGGTCACTAATO ?y?A 3180 

ITSDKFREI IKLVPL.VSIDL 
T?T>ACA'KIAGATAAATTTAGAGAAATTArrCAAGTTAGTTCCATTAGTATCAATTGATC'TGC 3240 

LIENENGEYLFGLRNNRPAK 
TAATTOAAAACGAGAATGGTGAATATTTAT'TTGGTCTTAGGAA'rAA'rCGACCGGCCAAAA 3300 

NYFFVPGGRIRKNESIKNAF 
ATOATOOTTTTOTTCCAGGTGGTAGGATTO^ 3360 

KRISSMELGKEYGISGSVFN 
AAAGAATATCATGTATOGAAT'TAGGa?AAAGAGTATGGTATT i rCAGGAAGTG l r'r l rT i rAA'I , G 3420 

GVWEHFYDDGFFSEGEATHY 



VLCYTL.KV 



I» N 1» P D 



ATCGTGAATACCTTTG 



Q D V H N 



End of orf 3 Start of orf 4 

, * M 
<3 TAaiTTTTAOTAAAAATTAATATGCGAGAGAATTOT AgGT 



Q C L Y 



CTGAATGTCCTTACCCTGTAACTATTGCCGGAGGAAGCGGAAGCCGTCTA ' rGGCCG'r'EGT 



CTCGAGTATTATACCCTAAACAA'F 



T R L D G 



" lAGTTGGGGATTCTACAATG ' PrGGAAA 



DHRFIVAEQLRQIGKLTKNI 
ATCACGGATTTATTGTAGeAGAGCAATTAGGACAGATTGG'TAAGG'rAACCAAGAA'rAOTA 



TACTTGAGGCGAAAGGCCGTAATACTGCACGTGCGATAGGTTTAGG'rGC'P 



3540 
3600 

3660 
3720 
3780 
3840 
3900 
3960 
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AGAAGAATAATGCTAATGACGACCCTTTATTATTAGTACTTGCGGCAGACCACTCTAT 



TTTCGAGAGTCAATAATAAAAGGTATGCCGTATGCAAGTTCTGGGA 



D T A N 



EKPDVKTAQEYIS SGNYYWN 
AAAAACCAGATGTTAAAACAGCACAGGAA'rA'gAT'ra'CGAGTGGGAAT'rA'IvrAC'rGGAA ' rA 



OTATTTCGCGCCAGTAAATATC 



DIYHSCECATATANIDMD 
ATATT'PATCA'I ' AGCTGTGAATGTGCAACCGCTAGAGCAAATATAGATA'PGGACI 1 



E A E P 



K. T K D A V 



I G W N D V 



GGTCATCACOTTGGGATATAAGeCAAAAGGATOOCCmTGGTAATOTOTOCCATGGGGATO 

VLNHDGENSF IYSES SLVAT 
TCCTCAATCATCATGGAGAAAA ' TAgT'r'rTATTTACTC'PGAG ' rCAAGTCTCGTTGCGACAG 

VGVSNLVIVQTKDAVLVADR 
TeGGAGTAAGTAATTTAGTAACTGTGGAAACCAAGGATGCTO'PAe ' rGG^'rGCGGACCGTG 



YYMHRAVFRPWGKFDAIDQG 
ACTACATGCATGGTOCAGTTTTTCGCGGTTOGG<3TAAA?TGGATGCAATAGACCAAGGCG 

DRYRVKKIIVKJPGEGLiDLRM 
ATAGATATAGAGTAAAAAAAATAATAGTTAJ^CCAGGAGAAGGGTOAGATVFTAAGGA'TQC 



R A E H W I 



ATCATGATAGGGCAGAGGATTGGATTOTT 



SGTAKVSLG 



G A K Y 



GTGAAG OTAAACrrATTAGTTTCTAATOAGTCTATA'TA'rATCt 

slenpgviplhlievs sgdy 
gtcotgaoaatccaggggtaatagctttgcatctaatogaagtaagct:tggtoaotacc 

lesddivrftdrynskqflk 
togaatgagatgatatagtgggtt^actgagaqatataacagtaaacaaotgg'raaagc 



4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 



End of orf 4 Start of orf 5 

MN KITCFKA 

R D * 

GAGATTOATAAATATgAATAAAATAACTOGGTOCAAAGCAl 
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TCGTGCTGAAT'rGAA'rCAa ' GAAA'rAGCA'rATAGAAT'rGGTCGCGCTTATGGTGAGTTTTT 

KPQTVVVGGDARLTSESLKK 
TAAACCTCAAACTGTAGTTGTGGGAGGAGATGCTCGCTTAACAAGTGAGAGTTTAAAGAA 

SLSNGLCDAGVNVLDLGMCG 
ATCACTCTCAAATGGGCTATCTGATGCAGGCGTAAATGTCTTAGATCTTGGAATGTCTGG 



?ACTGAAGAGATATATTTTTCCACT ' 



rTAGGAATTGATGGTGGAATCGAGGTAAC 



ASHNPIDYNGMKLVTKGARP 
TGCAAGCCATAAI'CGAATTGArPTATAATGGAATGAAATTAGTAACCAAAGGTGCTCGACC 

ISSDTGLKDIQQLVESNNFE 
AATCAGCAGTCACACAGGTCTCAAAGATATACAACAATTAGTAGAGAGTAATAATTTTGA 

ELNLEKKGNI TKYSTRDAYI 
AGAGCTCAACCTAGAAAAAAAAGGGAJlTATTACCAj ' iATATTCCACCCGAGATGCCTACAT 

NHLMGYANLQKIKKIKIVVN 
JiAATCATTTOATCGGCTATGCTAATCTGCAAAAAATAAAAAAAATCAAAATAGTTG'rGAA 

SGNGAAGPVIDAIEECFLRN 
TTCTGGGAATGGTGCAOCTGGTCCTGOTATTGATGCTATTCAGGAATGCTTTTTAGGGAA 



CAATATTCCGATTCAGTTTGTAAAAATAAATAATACACCCGATGGTAAT i r i rT?CCACATGG 

IPNPLLPECREDTSSAVIRH 
TATCCCTAATCCATTACTJiCCTGAGTGCAGAGAAGATACCAGCAGTGCGGTTATAAGACA 

SADFGIAFDGDFDRCF FFDE 
yAGTGCTGATTTTGGTA'rTGCATTTOATCGTGAT'TTTGATAGGTGTTOTTTCTTTGATGA 



AAATGGACAATTTATTGAAG 



Gx^ATATCGAiy i iCGCAAAAATCATTCATGATCCTCGCCTTATATGGAATACT, 

V ESHGGIPIMTKTGHAYIKQ 
CGTAGAAAGTCATGGTGGTATACCTATAATGACTAJiAACCGGTCATGCTTACATTAAGCA 

RMREEDAVYGGEMSAHH v,: '' If ' 
AAOAATGCCTOAAGAGGATCGCGTATATGGGGCCOAAATGACTGCGCATGAT 1 



ATTTTGCATACTGCGATAGTGGAATGATTCGT' 



'TCTGACAJU i iTAAAAAATTAGGTOAACTGG'rT'rG'rGGOTC'rA'rAAACGAC'rGGCGGGGAAG 

GEINCTLDNPQNEIDKLFNR 
T G OAOAJATAAACTGTAGACTACm.CAATCCGCAAAATGAAATAGATAAATTATTTAA'rCG 

KD SALAVDYTDGLTMEFSD 
gTACAAAGATAGTGCCTTAGCTGTTGATTACAGTGATGGATTAACTATGGAGTTCTCTGA 

FNVRCSNTEPVVRLNVES 
TTGGGOTTTTAATGTTAGATGCTCAAATACAGAACCTGTAGTACGATTGAATGTAGAATC - 

R NNAILMQEKTEEILNFISK 
■PAGGAATAATGCTATTCTTATGCAOGAAAAAACAGAAGAAATTCTGAATTTTA'PATGAAA 
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End^of orf5 sta:rt of orf6 

> m . MKVLLTG 
A IE1AATTTG CAC CTGAGTTCATAATGGGAACAAGAAATATATjlAAAGTACTTCTGACTGG 

STGMVGKNI L EHD SAS KYNI 
CTCA/iCTGGCATGGTTGGTAAGAATATATTAGAGCATOATAGTGCAAGTAAATATAATAT 

jJL TPTSS DLNLLDKNEIEKPM 
ACTTACTCCAACCAGCTCTGATTTGAATTTATTASATAAAAATGAAATAGAAAAATTCAT 

L INMPDCI IHAAGLVGGIHA 
GCTTATCAACATGCCAGACTGTATTATACATGCAGCGGGATTAGTTGGAGGCATTCATGC 

N I S R P F D 
AAATATAAGCAGGCCGTTTGATT 

SVAKKLG IKKVLNLGS SCMY 
TTCCGTCGCAAAAAAACTAGGTATGAAGAAAGTGCTTAACI'TGGGTAGTTCATGCATGTA 

PKNFEEAIPEKALLTGELEE 
eCCCAAAAACTTTGAAGAGGCTATTCCTCAGAAAGCTCTGTTAACTCGTOAGCTAGAAGA 

T NEGYAIAKIAVAKACEYIS 
AACTAATGAGGGATATGCTATTGCGAAAATTGCTGTAGCAAAAGCATGCGAATATATATC 

RENSNYFYKTI IPCNLYGKY 
^GAOAAAACTCTAATTATTTTOATAAAACAATTATCCCATO 

DKFDDNSSHMIPAVIKKIHH 
^GATAAATTTGATGATAACTCGTCACATATGATTCCGGCAGTTATAAAAAAAATCCATCA 

AKINNVPEIEIWGDGNSRRE 
TGCGAAAATTAATAATGTCCCAGAGAl'CGAAATTTGGGGGGATGGTAATTCGCGCCGTGA 

FMYAEDLADL I Fvvt p v t 
GTTTATGTATGCAGAi ' iGATTTAGCTCATCTTATT? 



TATGTTATTCCTAAAATAGAATT 



CATGCCTAATO rTGG TAAAT G CTGGTTTAGGTTACGATTATTGAATTAATGACTATTATAA 

^,L.£— A EEIGYTG SF SHDLT KPT 
GATAATTGCAGAAGAAATTOGTTATACTGGGJiGTTTTTCTCATGATTTAACAAAACCAAC 

GMKRKLVDI S L LNKIGWS SH 
AGGAATGAAACGGAAGCTAGTAGATATT^GATTGGTTAATAAAATTGGTTCG'rCAAGTCA 

FELRDGIRKTYNYYLENQNK 
CTTTGAACTCAGJ1GAT&GCATGAOAA1AGACCTATAATTATTACTTGOAGAATCAAAATAA 
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Start of or£7, ©f or£6 



ATGATTACATACCCAC ' 



TGCTAGTAATACTTGGGATGAATATGAGTATGCAGCAATACAG 

SVIDSKMFTMGKKVELYEKN 
TCAGTAATT^CTGAAAAATCTCTACGATCOCTAAAAAGGCTGAGTOATATCAG^iAAAT 

FADLFGSKYAVMVS SGSTAN 
T - TTGCTGATTTCTTTGGTAGGAAATATOCCGTAJiTCGTTAGCTCTGGTTCTAGAGCTAAT 
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ij L M I-A AL F FT-N K P KLKRG 'D E 
CTGTTAATGATT G CTGCCGTTTTCTTCACTAATAAACCAAAACTTAJJ ' AOAGGa'GATGAA 

I IVPAVSWSTTYYPLQQYGL 
ATAATAGTACCTGCAGTGI'CATGGTCTACGACATAI'TACCCTCTCCAACAGTATGGCTTA 



AAGGTGAAGTTTGTCGATATCAi ' iTAAAGAAACTTTAAATATTGA ' TATCGATAG'rT'rGAAA 

N A I S D K T K A I LTVNL LGNPN 
AATGCTATTTCAGATAAAACAAAAGCAATAT'rGACAGTAAA'rT'rAT'rAGGTAATCCTAA'r 

DFAKINEI INNRDIILLEDN 
GATTTTOCAAAAATAAATGAGATAATAAATAATAGGGATAOTATCT'TACTAGAAGA'rAAC 

CESMGAVFQNKQAGTFGVMG 
T G T G AGTCGATGGGCGCCGTCTTTCAAAATAAGCAGGCAGGCACATTCGGAGTTATGGGT 

TFSSFYSHHIATMEGGCVVT 
ACCTTTACTTCTTTTTACTCTCATCATATAGCTAGAATGGAAGGGGGCTGCCTAGTTACT 

D DEELYHVLLCLRAHGWTRN 
G ATGATGAAGAGCTGTATCATGTA ' PTGTTGTGCCTTCGAGCTCATGGTTGGACAAGAAAT 



LPKENMVTGTKSDDIFEESF 
TTACCAAAAGAGAATATGGTTACAGGCACTAAGAGTGATGATATTTTCGAAGAGTCGTT 



AAGTTTCTTTTACCAGGATACAATG'P 



GGCCCAGTTGAAATGAGTGGTGCTATTGGGATA 



EQLKKLPGFISTRR 
AGCAACTTAAAAAGTTACCAGGTTTTATATCCACCAGACG 1 



TCCAATGCACAATATTTT 
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GTAGATAAATTTAAAGATCATCCATTCCTTGATATACAAAAAGAAG' 



8340 



TGGTTTGGTTTTTCCITCGTTATAAAGGAGGGAGCTGCTATTGAGAGGAAGAGTTTAGTA 

NNLISAGIECRPIVTGNFLK 
AJiTAATCTGATCTCAGCAGGCATTGAATGCCGACCA^iTTOTTACTGGGAATTTTCTCAAA 



AATGAACGTGTOTTOAGTTAT ' 



GAOTACTCTGTAGATGATACGGTAaCAAATGCCGAA 



yATATAGATAAGAATGGI" 



TTTGTGGGAAACCAGGAGATACCT 
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End of orf 7 



G ATTAT CTACG AAAAGTATTAAAA ZtAACTAAC G AGG C ACTCTATTT GG AATAGAG TGC CT 



Start of orf 8 



G ASGG TATTAAC AGTG AAAAAAAI ' TT TAG CGTT 



E Q 



CTATTCTAAAGTACTACCAC 



CGGTTATTOAACAGTTTGTGAATCCAATTTGCATCTTCATTATCACACCACTAATACTGA 8760 



NHLGKQSYGNWI 
ACCACCTGGGTAAGCAAAGCTATGGTAATTGGft ?' 



PPATTAATTAG ' PATTGTATGTTTTT 
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CTCAGTTAATATCTGGAGCATGTTCCGCATGGATTGCAAAAATCATTQ 



TTCTTAGTGATTTATCAAAi^AAAATGCTTTACGTCAAA'r'rTCCTATAATT'rT'r 



TTATTATCG CATTTGCGGTATTGATT 



TTATATTAAGTATT 



TTGCGAGGAATAATTCTTCATTCTTATTCGCGATTATTATTTGTGGrPTT'rT'r'rCAGGAAG 

VDNLFSGALKGFEKFNVSCF 
TTGATAATTTATTTAGTGGTGCGCTAAAAGGTTTTGAAAAATTTAATG'TA'rCATGTT'rTT 

FEVITRVLWASIVIYGIYGN 
TTGAAOTAATTACAAGAGTGCTCTGGGCTI'CTATAGTAATATATGGCAT'JTACGGAAAa'G 

ALLYFTCLAFTIKGMLKYIL 
CACTCTTATATTTTACATGTTOAGCCTTTACCATTAAACG'rATOC ' rAAAATATATTCTTG 

VCLNITGCFINPNFNRVGIV 
TATGTCTGJ ' u\TATTACCGGTTGTTTCATCAATCCTAATTTTAATAGACTTGGGAa ' TGTTA 



ATTTQTTAAATGAGTCAAAATGGATG 



TCTTTGATAGGCtCCGTAATACCATTGATT 



TTCAATTAACTGGTGGCGT ' CTCAG^ 



■TATCTGTCAGTAAACTGGCTTCTTATGTCC - 



GTTCCCTTCA?i.CTAGCTCAATTGAT < 



TGCGTCTGCAAATCAAATATTAC 



TAATTGTTTTTTTAAAA 



TACCAATGTTTGCTAGAATGAAAGCATGTAACACATTTe 



TTCTGCTTGTATCACTAATTTCTGTTTTGCCTTGTCTTGCGTTATTCTTTTT 1 ! 

DILS IWINPTFATENYKLMQ 
ATATATTATCA/iTATG G ATAAAGCCTAGATTTGCAAGTGAAAATTATAAATTAATGCAAA 



J-AISYILLSMMTS 
TTTAOCTATAAGTTACATTTOATTGTCAATGATGACATG 



fFrCATTTCTTGTTATTAG 



GAATTGGTAAATCTAAGCTTGI'T i 



TTAAATCTGGTTGCAGGGCTCGCACTTGCTG 

ASTLIAAHYGLYAISMVKI I 
CTTCAACGTOAATCGCAGCTCJ'xTTATOGGCTOTATGCAATATCTATGGTiWuWAATAT 

AFQFYYLYVAFVYFNRAK 
ATCGGGCTTTTCAATTTTATTACGTTTJiTGTAGGTTTTGTCTATTTTAATAGAGCGAAAA 
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Start of orf9 # End of or £8 



TTTTCAATTACTGAAATCGCAATT 



TTCTTGCACTATT 



MRRIYLDKSILI 
STTAATGCGGAGGATCTATTTAGATAAAAGTATTTTAATT 

LLCLLFFLVI IQLPELNVNG 
CTTTTATGCTTGCTCTTTTTTTTAGTAATCATTCAACTTCCTGAGCTTAATGTAAACGGT 
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^VDSLKLSLPLLMVFIAFQK 
TTGGTCGATTCTTTAAAGTTATCACTGCCTTTATTGATGGTCTTTATCGCTTTTCAAAAA 10140 

PKLCLWVI IALLFLNSAFNF 
CCGAAATTATGCTTGTGGGTTATTATTGCATTGTTGTTTTTGAACTCTGCATTTAATTTT 10200 

LYLKTFDKFS SFPFTFFIL L 
TTATATTTAAAGACATTCGATAAGTTTAGCTCATTTCCTTTTACTTTTTTTATATTGCTG 10260 

FYLFRLGIGNLPVYKNK KFY 
TTTTACTTGTTTAGATTGGGAATTGGTAATTTACCGGTTTATAAAAATAA 10320 

ALIFLFILIDIMQSL LINYR 
GCGTTGATTTTTCTCTTTATATTAATAGACATAATGCAGTCATTGTTAATAAATTATAGG 10380 

GQILYSVICILILVFKVNLR 
GGGCAGATTTTATATTCCGTAATTTGCATCCTGATACTTGTGTTTAAAGTTAATTTAAGA 10440 

KKIPYFFLMLPVLYVI IMAY 
AAAAAGATTCCATACTTTTTTTTAATGCTGCCAGTTTTATATGTAATTATTATGGCTTAT 10500 

IGFNYFNKGVTF FEPTASNI 
ATTGGTTTTAATTATTTCAATAAAGGCGTAACTTTTTTTGAACCTACAGCAAGTAATATT 10560 

ERTGMIYYLVSQLGDYIFHG 
GAACGTACGGGGATGATATATTATTTGGTTTCACAGCTTGGTGATTATATATTCCATGGT 10620 

MGTLNFLNNGGQYKTLYGLP 
ATGGGGACATTAAATTTCTTAAATAACGGCGGACAATATAAGACGTTATATGGACTTCCA 10680 

SLIPNDPHDFLLRFFISIGV 
TCATTAATTCCTAATGACCCTCATGATTTTTTATTACGGTTCTTTATAAGTATTGGTGTG 10740 

IGALVYHS IF FVF FRRI SFL 
ATAGGAGCATTGGTTTATCATTCTATATTTTTTC^ 10800 

LYERNAPFIVVSCLLLLQVV 
TTATATGAGAGAAATGCTCCTTTCATTGTTGTAAGTTGTTTGTTACTGTTACAAGTTGTG 10860 

LIYTLNPFDAFNRLICGLTV 
TTAATTTATACATTAAACCCTTTTGATGCTTTTAATCGATTGATTTGCGGGCTTACAGTT 10920 

Start of orfXO End of orf 9 

GVVYGFAKIR* 

MDLQKLDKYTCNGNLDA 
GGAGTTGTTTM^ATTTGCAAAAATTAGArAAGTATACCTGTAATGGAAATTTAGACGC 10980 

m J!L. VSI1 IATYNSELDIAKCL 
TCCACTTGTTTCAATAATCATTGCAACTTATAATTCTGAACTTGATATAGCTAAGTGTTT 11040 

QSVTNQSYKN-IEI IIMDGGS 
GCAATCGGTAACTAATCAATCTTATAAGAATATTGAAATCATAATAATGGATGGAGGATC 11100 

SD KTLDIAKSFKD DRIKIVS 
TTCTGATAAAACGCTTGATATTGCAAAATCGTTTAAAGACGACCGAATAAAAATAGTTTC 11160 

EKDRGIYDAWNKAVDLSIGD 
AGAGAAAGATCGTGGAATTTATGATGCCTGGAATAAAGCAGTTGATTTATCCATTGGTGA 11220 

V* V AFIGSDDVYYHTDAIASL 
TTGGGTAGCATTTATTGGTTCAGATGATGTTTACTATCATACAGATGCAATTGCTTCATT 11280 

MKGVMV SNGAPV VYGRTAHE 
GATGAAGGGGGTTATGGTATCTAATGGCGCCCCTGTGGTTTATGGGAGGACAGCGCACGA 11340 
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GPDRNISGFSGSEWYNLTGF 
AGGTCCCGATAGGAACATATCTGGATTTTCAGGCAGTGAATGGTACAACCTAACAGGATT 11400 

KFNYYKCNLPLPIMSAIYSR 
TAAGTTTAATTATTACAAATGTAATTTACCATTGCCCATTATGAGCGCAATATATTCTCG 11460 

DFFRNERFDIKLKIVADADW 
TGATTTCTTCAGAAACGAACGTTTTGATATTAAATTAAAAATTGTTGCTGACGCTGATTG 1152 0 

FLRCFIKWSKEKSPYFINDT 
GTTTCTGAGATGTTTCATCAAATGGAGTAAAGAGAAGTCACCTTATTTTATTAATGACAC 11580 

TPIVRMGYGGVSTDISSQVK 
GACCCCTATTGTTAGAATGGGATATGGTGGGGTTTCGACTGATATTTCTTCTCAAGTTAA 11640 

TTLESFIVRKKNNISCLNIQ 
AACTACGCTAGAAAGTTTCATTGTACGCAAAAAGAATAATATATCCTGTTTAAACATACA 11700 

LILRYAKILVMVAIKNIFGN 
GCTGATTCTTAGATATGCTAAAATTCTGGTGATGGTAGCGATCAAAAATATTTTTGGCAA 11760 

KLMH NGYHSLKKIKNKI 
TAATGTTTATAAATTAATGCATAACGGGTATCATTCCCTAAAGAAAATCAAGAATAAAAT 11820 

Start of orfll, End of orflO 

M^KIVYI ITGLTCGGAEHLMT 

ATOAAGATTGTTTATATAATAACCGGGCTTACTTGTGGTGGAGCCGAACACCTTATGACG 11880 

QLADQMFIRGHDVNI ICLTG 
CAGTTAGCAGACCAAATGTTTATACGCGGGCATGATGTTAATATTATTTGTCTAACTGGT 11940 

»m«J; EVKPTQNINIHYVNMDKN 
ATATCTGAGGTAAAGCCAACACAAAATATTAATATTCATTATGTTAATATGGATAAAAAT 12000 

FRSFFRALFQVKKIIVALKP 
TTTAGAAGCTTTTTTAGAGCTTTATTTCAAGTAAAAAAAATAATTGTCGCCTTAAAGCCA 12060 

DIIHSHMFHANIFSRFIRML 
GATATAATACATAGTCATATGTTTCATGCTAATATTTTTAGTCGTTTTATTAGGATGCTG 12120 

,L, PAVPLICTAHNK NEG GNAR 
ATTCCAGCGGTGCCCCTGATATGTACCGCACACAACAAAAATGAAGGTGGCAATGCAAGG 12180 

MFCYRLSDFLASIT TNVSKE 
ATGTTTTGTTATCGACTGAGTGATTTTTTAGCTTCTATTACTACAAATGTAAGTAAAGAG 12240 

JLJL qefiar katpknkiveip 

GCTGTTCAAGAGTTTATAGCAAGAAAGGCTACACCTAAAAATAAAATAGTAGAGATTCCG 12300 

NF I.NTNKFDF D INVRK KTRD 
AATTTTATTAATACAAATAAATTTGATTTTGATATTAATGTCAGAAAGAAAACGCGAGAT 12360 

AFNLKDSTAVL LAVGRLVEA 
GCTTTTAATTTGAAAGACAGTACAGCAGTACTGCTCGCAGTAGGAAGACTTGTTGAAGCA 12420 

KDYPNLLNAINHLILSKTSN 
AAAGACTATCCGAACTTATTAAATGCAATAAATCATTTCATTCTTTOVAAAACATCAAAT 12480 

CNDFILLIAGDGALRNKLLD 
TGTAATGATTTTATTTTGCTTATTGCTGGCGATGGCGCATTAAGAAATAAATTATTGGAT 12540 

LVCQLNLVDKVF FLGQRSDI 
TTGGTTTGTCAATTGAATCTTGTGGATAAAGTTTTCTTCTTGGGGCAAAGAAGTGATATT 12600 
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AAAGAATTAATGTGTGCTGCAGATCTTTTTGTTTTGAGTTCTGAGTGGGAAGGTTTTGGT 12660 

LVVAEAMACER PVVATDSGG 
CTCGTTGTTGCAGAAGCTATGGCGTGTGAACGTCCCGTTGTTGCTACCGATTCTGGTGGA 1272 0 

VKEVVGPHNDVIPVSNHILL 
GTTAAAGAAGTCGTTGGACCTCATAATGATGT'rATCCCTGTCAGTAATCATATTCTGTTG 12780 

AEKIAETLKID DNARKI IGM 
GCAGAGAAAATCGCTGAGAC ACTTAAAATAGATGATAACGCAAGAAAAATAATAGGTATG 12840 

KNREYIVSNFS IKTIVSEWE 
AAAAATAGAGAATATATTGTTTCCAATTTTTCAATTAAAACGATAGTGAGTGAGTGGGAG 12900 

_,,.„„ End of orf 11 

RLYFKYSKRNNIID* 
CGCTTATATTTTAAATATTCCAAGCGTAATAATATAATTGAT TGAAAATATAAGTTTGTA 12960 

CTCTGGATGCAATAGTTTCTCTATGCTGTTTTTTTACTGGCTCCGTATTTTTACTTATAG 13020 

CTGGATTTTGTTATATATCAGTATOAATCTGTCTCAACTTCATCTAGACTACATTCAAGC 13080 

Start of gnd 

CGCGCATGCGTCGCGCGGTGACTACACCTGACAGGAGTATGTA ATGTCCAAGCAACAGAT 13140 

GVVGMAVMGRNLALN I ESRG 
CGGCGTCGTCGGTATGGCAGTGATGGGGCGCAACCTGGCGCTCAACATCGAAAGCCGCGG 13200 

YTVSIFNRSREKTE EVVAEN 
TTATACCGTCTCCATCTTCAACCGCTCCCGCGAGAAAACTGAAGAAGTTGTTGCCGAGAA 13260 

PDKKLVPYYTVKEFVESLET 
CCCGGATAAGAAACTGGTTCCTTATTACACGGTGAAAGAGTTCGTCGAGTCTCTTGAAAC 1332 0 

PRRILLMVKAGAGTDAAIDS 
CCCACGTCGTATCCTGTTAATGGTAAAAGCAGGGGCGGGAACTGATGCTGCTATCGATTC 13380 

LKPYLDKGDI IIDGGNTFFQ 
CCTGAAGCCGTATCTGGATAAAGGCGACATCATTATTGATGGTGGCAACACCTTCTTCCA 13440 

DTIRRNRELSAEGFNFIGTG 
GGACACTATCCGTCGTAACCGTGAACTGTCCGCGGAAGGCTTTAACTTCATCGGTACCGG 13 500 

VSGGE EGALKG PSIM PG GQK 
CGTGTCCGGCGGTGAAGAGGGCGCCCTGAAAGGCCCATCTATCATGCCAGGTGGCCAGAA 13560 

™~ YELVAp ILTKIAAVAEDG 
AGAAGCGTATGAGCTGGTTGCGCCTATCCTGACCAAGATTGCTGCGGTTGCTGAAGATGG 13 620 

EPCITYIGADGAGHYVKMVH 
CGAACCATGTATAACTTACATCGGTGCTGACGGTGCGGGTCACTACGTGAAGATGGTGCA 13 680 

™^„ G IEYGDM QLIAEAYSL LKG 
CAACGGTATCGAATATGGCGATATGCAGCTGATTGCTGAAGCCTATTCTCTGCTTAAAGG 13740 

GLNLSNE ELATTFTEWNEGE 
CGGCCTTAATCTGTCTAACGAAGAGCTGGCAACCACTTTTACCGAGTGGAATGAAGGCGA 13 800 



13860 
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K-YLVDV I LDEA ANKGT GK WT 
TAAATACCTGGTTGATGTGATCCTGGACGAAGCTGCGAACAAAGGCACCGGTAAATGGAC 13920 

SQSSLDLGEPLSLITESVFA 
CAGCCAGAGCTCTCTGGATCTGGGTGAACCGCTGTCGCTGATCACCGAATCCGTATTCGC 13980 

RYIS SLKDQRIAASKVLSGP 
TCGCTAC ATCTCTTCTCTGAAAGACCAGCGCATTGCGGCATCTAAAGTGCTGTCTGGTCC 14040 

QAKLAGDKAEFVEKVR RALY 
GCAGGCTAAACTGGCTGGTGATAAAGCAGAGTTCGTTG AG AAAGTCCGTCGCGCGCTGTA 14100 

LGKIVSYAQGF SQLRA ASDE 
CCTGGGTAAAATCGTCTCTTATGCCCAAGGCTTCTCTCAACTGCGTGCCGCGTCTGACGA 14160 

YNWDLNYGEIAKIFRAGCI I 
ATACAACTGGGATCTGAACTACGGCGAAATCGCGAAGATCTTCCGCGCGGGCTGCATCAT 14220 

RAQFLQKITDAYAENKGIAN 
TCGTGCGCAGTTCCTGCAGAAAATTACTGACGCGTATGCTGAAAACAAAGGCATTGCTAA 14280 

LLLAPYF KNIADEYQ QALRD 
CCTGTTGCTGGCTCCGTACTTCAAAAATATCGCTGATGAATATCAGCAAGCGCTGCGTGA 14340 

VVAYAVQNGI PVPTF S A A V A 
TGTAGTGGCTTATGCTGTGCAGAACGGTATTCCGGTACCGACCTTCTCTGCAGCGGTAGC 14400 

YYDSYRSAVLPANL.IQAQRD 
CTACTACGACAGCTACCGTTCTGCGGTACTGCCGGCTAATCTGATTCAGGCAC AGCGTGA 14460 

YFGAHTYKRTDKEGVFHTG 
TTACTTCGGTGCGCACACGTATAAACGCACTGATAAAGAAGGTGTGTTCCACACCG 14516 
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GTAACCAAGGGCGGTACGTGCATAAATTTTAATGCTTATCAAAACTATTAGC ATTAAAAA 6 0 

Start, of orfl 

MNKETVSI IMPVYN 
TATATAAGAAATTCTC AAATGAACAAAGAAACCGTTTCAATAATTATGCCCGTTTACAAT 120 

GAKTI ISSVESI IHQSYQDF 
GGGGCCAAAACTATAATCTCATCAGTAGAATCAATTATACATCAATCTTATCAAGATTTT 180 

VLYIIDDCSTDDTFSLINSR 
GTTTTGTATATCATTGACGATTGTAGCA.CCGATGATACATTTTCATTAATCAACAGTCGA 240 

YKNNQKIRILRNKTNLGVAE 
TACAAAAACAATCAGAAAATAAGAATATTGCGTAACAAGACAAATTTAGGTGTTGCAGAA 300 

SRNYGIEMATGKYISFC DAD 
AGTCGAAATTATGGAATAGAAATGGCCACGGGGAAATATATTTCTTTTTGTGATGCGGAT 360 

DLWHEKKLERQ I EVLNNECV 
GATTTGTGGCACGAGAAAAAATTAGAGCGTCAAATCGAAGTGTTAAATAATGAATGTGTA 420 

DVVCSNYYVIDNNRNIVGEV 
GATGTGGTATGTTCTAATTATTATGTTATAGATAAC AATAGAAATATTGTTGGCGAAGTT 480 

NAPHVINYRKMLMKN YIGNL 
AATGCTCCTCATGTGATAAATTATAGAAAAATGCTCATGAAAAACTACATAGGGAATTTG 540 

TGIYNANKLGKFYQKKIGHE 
ACAGGAATCTATAATGCCAACAAATTGGGTAAGTTTTATCAAAAAAAGATTGGTCACGAG 600 

DYLMWLEI INKTNGAICIQD 
GATTATTTGATGTGGCTGGAAATAATTAATAAAACAAATGGTGCTATTTGTATTCAAGAT 660 

** m „ L AYYMRSN NSLSGNKIKAA 
AATCTGGCGTATTACATGCGTTCAAATAATTCACTATCGGGTAATAAAATTAAAGCTGCA 720 

KWTWSIYREHLHLSFPKTLY 
AAATGGACATGGAGTATATATAGAGAACATTTACATTTGTCCTTTCCAAAAACATTATAT 780 

YFLLYASNGVMKKITHSLLR 
TATTTTTTATTATATGCTTCAAATGGAGTCATGAAAAAAATAAC AC ATTC ACTATTAAGG 840 



AGAAAGGAGACTAAAAAGTOAAGTCAGCGGCTAAGTTGATTTTTTTATTCCTATTTACAC 900 

LYSLQLYGVI IDDRITNFDT 
TTTATAGTCTCCAGTTGTATGGGGTTATCATAGATGATCGTATAACAAATTTTGATACAA 96 0 

KVLTSIIIIFQIFFVLLFYL 
AGGTATTAACTAGTATTATAATTATATTTCAGATTTTTTTTGTTTTATTATTTTATCTAA 1020 

TIINERKQQKKFIVNWELKL 
CGATTATAAATGAAAGAAAACAGCAGAAAAAATTTATCGTGAACTGGGAGCTAAAGTTAA 1080 

ILVFLFVTIEIAAVVLFLKE 
TACTCGTTTTCCTTTTTGTGACTATAGAAATTGCTGCTGTAGTTTTATTTCTTAAAGAAG 1140 

™»L PIFDDDp GGAKLRIAEGN 
GTATTCCTATATTTGATGATGATCCAGGGGGGGCTAAACTTAGAATAGCTGAAGGTAATG 1200 
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GACTTTACATTAGATATATTAAGTATTTTGGTAATATAGTTGTGTTTGCATTAATTATTC 

LYDEHKFKQRTI IFVYFTTI 
TTTATGATGAGCATAAATTCAAACAGAGGACCATCATATTTGTATATTTTACAACGATTG 



ATATCCTGTCAAAGGATAACCGTAATCCTAAAATAAAAAGAATAATAGGGTATTTTTTAT 

LVGVVCSLFYLSLGQDGEQN 
TGGTAGGGGTTGTATGCTCGTTGTTTTATCTAAGTTTAGGACAAGACGGAGAACAAAATG 

DSYNNMLRI INRLTIEQVEG 
ACTCATATAATAATATGTTAAGGATAATTAATAGGTTAACAATAGAGCAAGTTGAAGGTG 

VPYVVSESIKNDF FPTPELE 
TTCCATATGTTGTTTCTGAATCTATTAAGAACGATTTCTTTCCGACACCAGAGTTAGAAA 

KELKAI INRIQGIKHQDLFY 
AGGAATTAAAAGCAATAATAAATAGAATACAGGGAATAAAGCATCAAGACTTATTTTATG 

GERLHKQVFGDMGAKFLSVT 
GAGAACGGTTACATAAACAAGTATTTGGAGACATGGGAGCAAATTTTTTATCAGTTACTA 

TYGAELLVFFGFLCVFI I P L 
CGTATGGAGCAGAACTGTTAGTTTTTTTTGGTTTTCTCTGTGTATTCATTATCCCTTTAG 

GIYIPFYLLKRMKKTHSSIN 
GGATATATATACCTTTTTATCTTTTAAAGAGAATGAAAAAAACCCATAGCTCGATAAATT 

CAFYSYI IMILLQYLVAGNA 
GCGCATTCTATTCATATATCATTATGATTTTATTGCAATACTTAGTGGCTGGGAATGCAT 

SAFFFGPFLSVLIMCTPLIL 
CGGCCTTCTTTTTTGGTCCTTTTCTCTCCGTATTGATAATGTGTACTCCTCTGATC'rTAT 



1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 



LHDTLKRLSRNENISYNCDL 
TGCATGATACGTTAAAGAGATTATCACGAAMSAAAATATCAGTTATAACTGTGACTTAr 

End of or£2 

^NNAEGLEKTLS SLSILKIKP 

AATAATGC TGAAGGGTTAGAAAAAACTTTAAGTAGTTTATCAATTTTAAAAATAAAACCT 

FEI IIVDGGSTDGTNRVISR 
TTTGAGATTATTATAGTTGATGGCGGCTCTACAGATGGAACGAATCGTGTCATTAGTAGA 

FTSMNITHVYEKDEGIYDAM 
TTTACTAGTATGAATATTACACATGTTTATGAAAAAGATGAAGGGATATATGATGCGATG 

NKGRMLAKGD L " I HYLNAGDS 
AATAAGGGCCGAATGTTGGCCAAAGGCGACTTAATACATTATTTAAACGCCGGCGATAGC 

p y,L G D I Y K H _J\ K B PCL I KVGL F 
GTAATTGGAGATATATATAAAAATATCAAAGAGCCATGTTTGATTAAAGTTGGCCTTTTC 

ENDKLLGFSSITHSNTGYCH 
GAAAATGATAAACTTCTGGGATTTTCTTCTATAACCCATTCAAATACAGGGTATTGTCAT 
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QGVIFPKNHSEYDLRYKICA 
CAAGGGGTGATTTTCCCAAAGAATCATTCAGAATATGATCTAAGGTATAAAATATGTGCT 2460 

DYKLIQEVFPEGLRSLSLIT 
GATTATAAGCTTATTC AAGAGGTGTTTCCTGAAGGGTTAAGATCTCTATCTTTGATTACT 2520 

SGYVKYDMGGVSSKKRILRD 
TCGGGTTATGTAAAATATGATATGGGGGGAGTATCTTCAAAAAAAAGAATTTTAAGAGAT 2580 

KELAK IMFEKNK KNL I KF I P 
AAAGAGCTTGCCAAAATTATGTTTGAAAAAAATAAAAAAAACCTTATTAAGTTTATTCCA 264 0 

ISI IKILFPERLRRVLRKMQ 
ATTTCAATAATCAAAATTTTATTCCCTGAACGTTTAAGAAGAGTATTGCGGAAAATGCAA 2700 



Start of orf 4 End of orf 3 

YICLTLFFMKNS SPYDNE* 

M I M N K I 

TATATTTGTCTAACTTTATTCTTCATGAAGAATAGTTCACCAT ATG ATAATGAA TIAAAAT 27 60 

KKILKFCTLK KYDTS SALGR 
CAAAAAAATACTTAAATTTTGCACTTTAAAAAAATATGATACATCAAGTGCTTTAGGTAG 2820 

EQERYRIISLSVISSLISKI 

agaacaggaaaggtacaggattatatccttgtctgttatttcaagtttgattagtaaaat 2880 

lsllsliltvsltlpylg'qe 
actctcactactttctcttatattaactgtaagtttaactttaccttatttaggacaaga 2940 

rfgvwmtitslgaaltfldl 
gagatttggtgtatggatgactattaccagtcttggtgctgctctgacatttttggactt 3000 

gignaltnriahsfacgknl 
aggtataggaaatgcattaacaaacaggatcgcacattcatttgcgtgtggcaaaaattt 3 060 

kmsrqi sggltllaglsfvi 
aaagatgagtcggc aaatt agtggtgggctc actttgctggctgg attatcgtttgtc at 3120 

taicyitsgmidwqlvikgi 
aactgcaatatgctatattacttctggcatgattgattggcaactagtaataaaaggtat 3180 

nenvyaelqhs ikvfvi ifg 
aaacgagaatgtgtatgcagagttacaacactcaattaaagtctttgtaatcatatttgg 3240 

lgiysngvqkvymgiqkayi 
acttcgaatttattcaaatggtgtgcaaaaagtttatatgggaatacaaaaagcctatat 3300 

snivnaifillsiitlviss 
aagtaatattgttaatgccatatttatattgttatctattattactctagtaatatcgtc 3360 

klhaglpvlivstlgiqyis 
gaaactacatgcgggactaccagttttaattgtcagcactcttggtattcaatacatatc 342 0 

giyltinli ikrlikftkvn 
gggaatctatttaacaattaatcttattataaagcgattaataaagtttacaaaagttaa 3480 

ihakreapyli lngf fffil 
catacatgctaaaagagaagctccatatttgatattaaacggttttttcttttttatttt 3 540 

qlgtlatwsgdnfi isitlg 
acagttaggc actctggcaacatggagtggtgataactttataatatctataacattggg 3 600 
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TGTTACTTATGTTGCTGTTTTTAGCATTACACAGAGATTATTTCAAATATCTACGGTCCC 3660 

LTIYNIPLWAAYADAHARND 
TCTTACGATTTATAACATCCCGTTATGGGCTGCTTATGCAGATGCTCATGCACGCAATG A 3720 

TQFIKKTLRTSLKIVGIS SF 
TACTCAATTTATAAAAAAGACGCTCAGAACATCATTGAAAATAGTGGGTATTTCATCATT 3780 

LLAFILVVFGSEVVNIWTEG 
CTTATTGGCCTTCATATTAGTAGTGTTCGGTAGTGAAGTCGTTAATATTTGGACAGAAGG 3 840 

KIQVPRTF I IAYALWSVIDA 
AAAGATTCAGGTACCTCGAACATTCATAATAGCTTATGCTTTATGGTCTGTTATTGATGC 3 900 

FSNTFASFLNGLNIVKQ QML 
TTTTTCGAATACATTTGCAAGCTTTTTAAATGGTTTGAACATAGTTAAACAACAAATGCT 3960 

AVVTLILIAI PAKYI IVSHF 
TGCTGTTGTAACATTGATATTGATCGCAATTCCAGCAAAATACATCATAGTTAGCCATTT 4020 

GLTVMLYCFIFIYIVNYFIW 
T OaOCTAACTOTTA!rCCTOTACTOCOT 4080 

Start of or £5, Knd of orf4 

,r ~ M K M 

YKCSFKKHIDRQLNIRG* 
GTATATATGTAOTTTTAAAAAACATATCGATAGACAGrc^AAATATAAGACG AjjgAAA AgG 4140 

KYIPVYQPSLTGKEKEYVNE 
AAATATATACCAGTTTACCAACCGTCATTGACAGGAAAAGAAAAAGAATATG'rAAA'rGAA 4200 



TCTCTGGACTCAACGTCGATCTCATCAAAAGGAAACTATA'r'rCAGAAG'r ' rTOAAAA'rAAA 

FAEQNHVQYAT TVSNGTVAL 
TTTGCGGAACAAAACCATGTGCAATATCCAACTACTGTAAGTAATGGAACGGTTOCTCTT 



CATTTAGCTOTGTTAGCGTTAGGTATATCGGAAGGAGATGAAGTT. 



ACATATATAOCATCAGTa'AATCCTATAAAATACACAGGAGGGACCCCCATTTTCGTTGAT 

SDNETWQMSVSDIEQKITNK 
TC^GATAATa\AACTTGGCAAATGTGTOTTAGTGACATAaAACAxAAAAATGAC'rAA'JAAA 

TKAIMCVHLYGH PCDMEQIV 
ACTAAAGCTATTATGTCTGTGCATTTAmGGGAGATCCATGTGATATGGAiiiCftAAOTG ' rA 



GAACTGGCCAAAAGTAGAAATT 1 



rTGTAATTCAAGATTGCGC 



AAATATAAAGGTAAATATGTGGGAACATTTGGAGATATTTC^AC'rTT'rAGC'r ' rTT ' r'rGGA 

NKTITTGEGGMVVTNDKTLY 
AATAAAAGI'ATTACTACAGOTGAAGGTCGAATGOTTGTGACGAATGACAAAACACTTTAT 

DRCLHFKGQGLAVHRQYWHD 
GAGCG'TTGT'rTAGATOTTAAAGGGCAAGGATTAGCTO'I'AGATAGGGAATATTGOCATGAC 

VIGYNYRMTNICAAIGLAQL 
CTTATAGGCTACAATTATAGGATGACAAATATCTGCGGTOCTATAGGATTAGCCCAGTTA 
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eqad'dfisrk'reiadiykkn 



S L V Q V H 



ESKDVFHTYWM 



VSILTRTAEEREELRNHLAD 
GTCTCAATTCTAACTAGGACG6CAGAGGAAAGAGAGGAATTAAGGAATCACCTTGCAGAT 

KLIETRPVFYPVHTMPMYSE 



KYQKHP IAEDLGWRGINL PS 
AAATATCAAAAGCAGCCTATAGCTCAGGA'rC'r'rGG'FrGGCGTGGAA ' rTAATTTACCTAGT 

FPSLSNEQVIYICES INEFY 
TTCCCCAGCCTATCGAAa'GAGCAAGTTATTTATAT'rTGTGAATG'PATTAACGAATOT'TAT 



4920 
4980 
5040 
5100 
5160 
5220 



End of or £5 St&rfc of orf 6 

SDK* MKIALNSD 
AGTOAO'AAAgAGCCTAAAATOTTqTAAAGGTCATTCMSAAAAT'rGCGTTGAATTCAGAT 5280 

GFYEWG GGIDFIKYILS I L E 
GGATTTTAGGAGTGG G GGGGTGGAATOGA TTTTATTAAATATATTCTGTCAATATTAGAA 5340 

TKPEIC IDIL LPRNDIH SLI 
ACGAAACC AGAAATATGTATCGATATTCTTTTACCGAGAAATGATATACATTCTCTTATA 5400 

REKAFPFKSILKA1LKRERP 
AGAGAAAAAGCATTTCCTTTTAAAAGTATATTAAAAGC AATTTTAAAG AGGGAAAGGCCT 5460 

RWI SLNRFNEQY YRDAF TOjN 
CGATGGATTTCATTAAATAGATTTAATGAGCAATACTATAGAGATGCCTTTACACAAAAT 552 0 

NIETNLTF1KSKS SAFYSYF 
AATATAGAGACGAATCTTACCTTTATTAAAAGTAAGAGCTCTGCCTTTTATTCATATTTT 5580 

DSSDCDVILPCMRVPSGNLN 
GATAGTAGCGATTGTGATGTTATTCTTCCTTGCATGCGTGTTCCTTCGGGAAATTTGAAT 5640 

KKAWIGYIYDFQHCYYPSFF 
AAAAAAGCATGGATTGGTTATATTTATGACTTTCAACACTGTTACTATCCTTCATTTTTT 5700 

SKREIDQRNVF FKLMLNCAN 

NI IVNAHSVI TDANKYVGNY 
AATATTATTGTTAATGCACATTCAGTTATTACCGATGCAAATAAATATGTTGGGAATTAT 5820 

SAKIiHSLPFSPCPQLKWFAD 
TCTGCAAAACTACATTCTCTTCCATTTAGTCCATGCCCTCAATTAAAATGGTTCGCTGAT 5880 

YSGNIAKYNI DKDYF I ICNQ 
TACTCTGGTAATATTGCCAAATATAATATTGACAAGGATTATTTTATAATTTGCAATCAA 5940 

FWKHKDHATAFRAFKIYTEY 
TTTTG GAAAC ATAAAGATCATGC AAC TGC TTTTAGGGC ATTTAAAATTTAT AC TGAATAT 6000 

NPDVYLVCTGATQDYRFPGY 
AATCCTGATGTTTATTTAGTATXK:ACGGGAGCTACTCAAGATTATCGATTCCCTGGATAT 6060 

FNELMVLAKKLGIESKI KIL 
TTTAATGAATTGATGGTTTTCGCAAAAAAGCTCGGAATTGAATCGAAAATTAAGATATTA 6120 
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GHIPKLEQIELIKNCIAVIQ 
GGGCATATACCTAAACTTGAAC^AATTGAATTAATCAAAAATTGCATTGCTGTAATACAA 6180 

PTLFEGGPGGGVTFDAIALG 
CCAACCTTATTTGAAGGCGGGCCTGGAGGGGGGGTAAC ATTTGACGCTATTGC ATTAGGG 6240 

KKVI LSDIDVNKEVNCGDVY 
AAAAAAGTTATACTATCTGACATAGATGTCAATAAAGAAGTTAATTGCGGTGATGTATAT 6300 

FFQAKNHYSLNDAMVKADES 
TTCTTTCAGGC AAAAAACC ATTATTCATTAAATGACGCGATGGTAAAAGCTGATGAATCT 6360 

KIFYEPTTLIELGLKRRNAC 
AAAATTTTTTATGAACCTACAACTCTGATAGAATTGGGTCTC AAAAGACGCAATGCGTGT 6420 



End of or£6 

ADFLLDVVKQEIESRS * 
GC AGATTTTCTTTTAGATGTTGTGAAACAAGAAATTGAATCCCGATCT TAATATATTCAA 6480 



Start of orf 7 

MTKVALITGVTGQDGSY 
GAGGTATATAATGACTAAAGTCGCTCTTATTACAGGTGTAACTGGAC AAGATGGATCTTA 6540 

TCTAGCTGAGTTTTTGCTTGATAAAGGGTATGAAGTTCATGGTATCAAACGCCGAGCCTC 6600 

SFNTERIDHIYQDPHGSNPN 
ATCTTTTAATACAGAACGCATAGACCATATTTATCAAGATCCAC ATGGTTCTAACCCAAA 6660 

FHLHYGDLTDS SNLTRILKE 
TTTTCACTTGCACTATGGAGATCTGACTGATTCATCTAACCTCACTAGAATTCTAAAGGA 6720 

VQPDEVYNLA AMSHVAV S FE 
GGTACAGCCAGATGAAGTATATAATTTAGCTGCTATGAGTCACGTAGCAGTTTCTTTTGA 6780 

SPEYTADVDAIGTLRL LEAI 
GTCTCCAGAATATACAGCCGATGTCGATGCAATTGGTACATTACGTTTACTGGAAGCAAT 6840 

RFLGLENKTRFYQAST S ELY 
TCGCTTTCTAGGATTGGAAAACAAAACGCGTTTCTATCAAGCTTCAACCTC AGAATTATA 6900 

GLVQEI PQKESTPFYPRSPY 
TGGACTTGTTCAGGAAATCCCTCAAAAAGAATCCACCCCTTTTTATCCTCGTTCCCCTTA 6960 

AVAKLYAYWITVNYRESYGI 
TGCAGTTGCAAAACTTTACGCATATTGGATCACGGTAAATTATCGAGAGTCATATGGTAT 7020 

YACNGILFNHESPRRGETFV 
TTATGCATGTAATGGTATATTGTTCAATCATGAATCTCCACGCCGTGGAGAAACGTTTGT 7080 

TRKITRGLANIAQGLESCLY 
AACAAGGAAAATTACTCGAGGACTTGCAAATATTGCAC AAGGCTTGGAATCATGTTTGTA 7140 

LGNMDS LRDWGHAKDYVRMQ 
TTTAGGGAATATGGATTCGTTACGAGATTGGGGAC ATGCAAAAGATTATGTTAGAATGCA 7200 

WLMLQQEQPEDFVIATGVQY 
ATGGTTGATGTTACAACAGGAGCAACCCGAAGATTTTGTGATTGC AAC AGGAGTCCAATA 7260 

SVRQFVEMAAAQLGI KMSFV 
CTCAGTCCGTCAGTTTGTCGAAATGGCAGCAGCACAACTTGGTATTAAGATGAGCTTTGT 7320 
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GKGIEEKGIVDSVEGQDAPG 
TGGTAAAGGAATCGAAGAAAAAGGCATTGTAGATTCGGTTGAAGGACAGGATGCTCC AGG 7380 

VKPGDVIVAVDPRYFRPAEV 
TGTGAAACCAGGTGATGTCATTGTTGCTGTTGATCCTCGTTATTTCCGACCAGCTGAAGT 7440 

DTLLGDPSKANLKLGWRPEI 
TGATACTTTGCTTGGAGATCCGAGCAAAGCTAATCTCAAACTTGGTTGGAGACC AGAAAT 7500 

TLAEMISEMVAKDLEA AKKH 
TACTCTTGCTGAAATGATTTCTGAAATGGTTGCCAAAGATCTTGAAGCCGCTAAAAAACA 7560 



Start of or£8. End of orf7 

M M M K K 
SLLKSHGFSVSLALE* 
TTCTCTTTTAAAATCGCATGGTTTTTCTGTAAGCTTAGCTCTGGAATGATGATGAATAAG 7620 

QRIFIAGHQGMVGSAITRRL 
CAACGTATTTTTATTGCTGGTCACCAAGGAATGGTTGGATCAGCTATTACCCGACGCCTC 7680 

KQRDDVELVI.RTRDELNL LD 
AAACAACGTGATGATGTTGAGTTGGTTTTACGTACTCGGGATGAATTGAACTTGTTGGAT 7740 

SSAVLDFFSSQKIDQVYLAA 
AGTAGCGCTGTTTTGGATTTTTTTTCTTCACAGAAAATCGACCAGGTTTATTTGGCAGCA 7800 

AKVGGILANS SYPADFIYEN 
GCAAAAGTCGGAGGTATTTTAGCTAACAGTTCTTATCCTGCCGATTTTATATATGAGAAT 7860 

IMIEANVIHAAHKNNVNKLL 
ATAATGATAGAGGCGAATGTCATTCATGCTGCCCACAAAAATAATGTAAATAAACTGCTT 7920 

FLGSSCIYPKLAHQPIMEDE 
TTCCTCGGTTCGTCGTGTATTTATCCTAAGTTAGCACACCAACCGATTATGGAAGACGAA 7980 

LLQGKLEPTNEPYAIAKIAG 
TTATTACAAGGGAAACTTGAGCCAACAAATGAACCTTATGCTATCGCAAAAATTGCAGGT 8040 

IKLCESYNRQFGRDYRSVMP 
ATTAAATTATGTGAATCTTATAACCGTCAGTTTGGGCGTGATTACCGTTCAGTAATGCCA 8100 

TNLYGPNDNFH P SNSHV I PA 
ACCAATCTTTATGGTCCAAATGACAATTTTCATCCAAGTAATTCTCATGTGATTCCGGCG 8160 

LIiRRFHDAVENNS PNVVVWG 
CTTTTCCGCCGCTTTCATGATGCTGTGGAAAACAATTCTCC^ 8220 

SGTPKREFLHVDDMASASIY 
AGTGGTACTCCAAAGCGTGAATTCTTACATGTAGATGATATGGCTTCTO 8280 

VMEMPYDIWQKNTKVMLSHI 
GTCATGGAGATGCC^TACGATATATGGCAAAAAAATACTAAAGTAATGTTGTCTC ATATC 8340 

NIGTGIDCTICELAETIAKV 
AATATTGGAACAGGTATTGACTGCACGATTTGTGAGCrTGCGGAAACAATAGCAAAAGTT 8400 

VGYKGHITFDT TKPDGAPRK 
GTAGGTTATAAAGGGCATATTACGTTCGATACAACAAAGCCCGATGGAGCCCCTCGAAAA 8460 

LliDVTLLHQLGWNHKITLHK 
CTACTTGATGTAACGCTTCTTCATCAACTAGGTTGGAATCATAAAATTACCCTTCACAAG 8520 
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„ „ „ End of orfB 

GLENTYNWFLENQLQYRG * 
GGTCTTGAAAATACATACAACTGGTTTCTTGAAAACCAACTTCAATATCGGGGG TAATAA 8 

Start of or£9 

MFLHSQDFATIVRSTPLISI 
2^ipTTTTACATTCCCAAGACTTTGCCACAATTGTAAGGTCTACTCCTCTTATTTCTATAG 8 ( 

DLIVENEFGEIL LGKRINRP 
ATTTGATTGTGGAAAACGAGTTTGGCGAAATTTTGCTAGGAAAACGAATCAACCGCCCGG 8 ' 

AQGYWFVPG GRVLKDEKLQT 
CACAGGGCTATTCGTTCGTTCCTGGTGGTAGGGTGTTGAAAGATGAAAAATTGCAGACAG 8 ' 

AFERLTEIELGIRLPLSVGK 
CCTTTGAACGATTGACAGAAATTGAACTAGGAATTCGTTTGCCTCTCTCTGTGGGTAAGT 8 1 

FYGIWQHFYEDNSMGGDFST 
TTTATGGTATCTGGCAGCACTTCTACGAAGACAATAGTATGGGGGGAGACTTTTCAACGC 8 1 

HYIVIAFLLKLQPNILKLPK 
ATTATATAGTTATAGCATTCCTTCTTAAATTACAACCAAACATTTTGAAATTACCGAAGT 8 ! 

SQHNAYCWLSRAKLINDDDV 
CACAACATAATGCTTATTGCTGGCTATCGCGAGCAAAGCTGATAAATGATGACGATGTGC 9 ( 

HYNCRAYFNNKTNDAIGLDN 
ATTATAATTGTCGCGCATATTTTAACAATAAAACAAATGATGCGATTGGCTTAGATAATA 9 ( 

Start of orf 10 End of or£9 

MSDAPI IAVVMAGGTGS 
KDIICLMRQ* 

AGGATATAATATGTCTGATGCGCCAA TAATTGCTGTAGTTATGGCCGGTGGTACAGGCAG 9 : 

WPLSRELYPKQ 



WTL.LQTTLLRLSGLSCQKPL 
TAACACCTTGTTACAAACGACTTTGCTACGACTTTC AGGCCTATCATGTC AAAAACCATT 9240 

VITNEQHRFVVAEQLREINK 
AGTGATAACAAATGAACAGCATCGCTTTGTTGTGGCTGAACAGTTAAGGGAAATAAATAA 9300 

LNGNI ILEPCGRNTAPAIAI 
ATTAAATGGTAATATTATTCTAGAACCATGCGGGCGAAATACTGCACCAGCAATAGCGAT 9360 

S AFHAL KRNPQEDPLLLVLA 
ATCTGCGTTTCATGCGTTAAAACGTAATCCTCAGGAAGATCCATTGCTTCTAGTTCTTGC 9420 

ADHVIAKESVFCDAIKNATP 
GGCAGACCACGTTATAGCTAAAGAAAGTGTTTTCTGTGATGCTATTAAAAATGCAACTCC 9480 

IANQGKIVTFGI IPEYAETG 
CATCGCTAATCAAGGTAAAATTGTAACGTTTGGAATTATACCAGAATATGCTGAAACTGG 9540 

YGYIERGELSVPLQGHENTG 
TTATGGGTATATTGAGAGAGGTGAACTATCTGTACCGCTTCAAGGGCATGAAAATACTGG 9600 

FYYVNKFVEKPNRETAELYM 
TTTTTATTATGTAAATAAGTTTGTCGAAAAGCCTAATCGTGAAACCGCAGAATTGTATAT 9660 

SGNHYWN SGIFMFKASVYL 
GACTTCTGGTAATCACTATTGGAATAGTGGAATATTCATGTTTAAGGCATCTGTTTATCT 9720 
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TGAGGAATTGAGAAAATTTAGACCTGACATTTACAATGTTTGTGAACAGGTTGCCTCATC 9780 

SYIDLDFIRLSKEQFQDCPA 
CTCATACATTGATCTAGATTTTATTCGATTATCAAAAGAACAATTTCAAGATTGTCCTGC 9840 

ESIDFAVMEKTEKCVVCPVD 
TGAATCTATTGATTTTGCTGTAATGGAAAAAACAGAAAAATGTGTTGTATGCCCTGTTGA 9900 

IGWSDVGSWQSLWDISLKSK 
TATTGGTTGGAGTGACGTTGGATCTTGGCAATCGTTATGGGACATTAGTCTAAAATCGAA 9960 

TGDVCKGDILTYDTKNNYIY 
AACAGGAGATGTATGTAAAGGTGATATATTAACCTATGATACTAAGAATAATTATATCTA 10020 

SESALVAAIGI EDMVIVQTK 
CTCTGAGTCAGCGTTGGTAGCCGCCATTGGAATTGAAGATATGGTTATCGTGCAAACTAA 10080 

DAVLVSKKSDVQHVKKIVEM 
AGATGCCGTTCTTGTGTCTAAAAAGAGTGATGTACAGCATGTAAAAAAAATAGTCGAAAT 10140 

LKLQQRTEYISHREVFRPWG 
GCTTAAATTGCAGCAACGTACAGAGTATATTAGTCATCGTGAAGTTTTCCGACCATGGGG 10200 

KFDSIDQGERYKVKKIIVKP 
AAAATTTGATTCGATTGACC AAGGTGAGCGATACAAAGTCAAGAAAATTATTGTGAAACC 10260 

GEGLSLRMHHHRSEHWIVLS 
TGGTGAGGGGCTTTCTTTAAGGATGCATCACCATCGTTCTGAACATTGGATCGTGCTTTC 1032 0 

GTAKVTLGDKTKLVTANESI 
TGGTACAGCAAAAGTAACCCTTGGCGATAAAACTAAACTAGTCACCGC AAATGAATCGAT 10380 

YIPLGAAYSLENPGI IPLNL 
ATACATTCCCCTTGGCGCAGCGTATAGTCTTGAGAATCCGGGCATAATCCCTCTTAATCT 10440 

IEVSSGDYLGEDDIIRQKER 
TATTGAAGTCAGTTCAGGGGATTATTTGGGAGAGGATGATATTATAAGACAGAAAGAACG 10500 

End of orf 10 Start of orfll 

Y K H E D * MKSLTCFKAYDIR 
TTACAAACATGAAGAT TAAC AT ATG AAATC TTTAACCTGC TTTAAAGC C TATG AT ATT CG 10560 

GKLGEELNEDIAWRIGRAYG 
CGGGAAATTAGGCGAAGAACTGAATGAAGATATTGCCTGGCGCATTGGGCGTGCCTATGG 10620 

EFLK.PKTIVLG GDVRLTSEA 
CGAATTTCTCAAACCGAAAACCATTGTTTTAGGCGGTGATGTCCGCCTCACCAGCGAAGC 10680 

LKLALAKGLQDAGVDVLDI G 
GTTAAAACTGGCGCTTGCGAAAGGTTTACAGGATGCGGGCGTCGATGTGCTGGATATCGG 10740 

m JL SGTEEIY FATFHLGVDGGI 
TATGTCCGGCACCGAAGAGATCTATTTCGCCACGTTCCATCTCGGAGTGGATGGCGGCAT 10800 

E VTASHNPMDYNGMKLVREG 
CGAAGTTACCGCCAGCCATAACCCGATGGATTACAACGGCATGAAGCTGGTGCGCGAAGG 10860 

A RPISGDTGLRDVQRLAEAN 
GGCTCGCCCGATCAGCGGTGATACCGGACTGCGCGATGTCCAGCGTCTGGCAGAAGCCAA 10920 

DFP PVDETKRGRYQ Q INLRD 
TGACTTCCCTCCTGTCGATGAAACCAAACGTGGTCGCTATCAGCAAATCAATCTGCGTGA 10980 
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AYVDHLPGYINVKNLTPL KL 
CGCTTACGTTGATCACCTGTTCGGTTATATCAACGTCAAAAACCTCACGCCGCTCAAGCT 11040 

VINSGNGAAG PVVDAI EARF 
GGTGATC AACTCCGGGAACGGCGCAGCGGGTCCGGTGGTGGACGCC ATTGAAGCCCGATT 11100 

KALGAPVELIKVHNTPDGNF 
TAAAGCCCTCGGCGCACCGGTGGAATTAATCAAAGTACACAACACGCCGGACGGCAATTT 11160 

PNGI PNPLLPECRDDTRNAV 
CCCCAACGGTATTCCTAACCCGCTGCTGCCGGAATGCCGCGACGACACCCGTAATGCGGT 11220 

IKHGADMGIAFDGDFDRCFL 
CATCAAACACGGCGCGGATATGGGCATTGCCTTTGATGGCGATTTTGACCGCTGTTTCCT 11280 

FDEKGQFIEGYYIVGLLAEA 
GTTTGACGAAAAAGGGCAGTTTATCGAGGGCTACTACATTGTCGGCCTGCTGGCAGAAGC 11340 

FLEKNPGAKI IHDPRLSWNT 
GTTCCTCGAAAAAAATCCCGGCGCGAAGATCATCCACGATCCACGTCTCTCCTGGAACAC 11400 

VDVVTAAGGT PVMS KTGHAF 
CGTTGATGTGGTGACTGCCGCAGGCGGCACCCCGGTAATGTCGAAAACCGGACACGCCTT 11460 

IKERMRKEDAIYGGEMSAHH 
TATTAAAGAACGTATGCGCAAGGAAGACGCCATCTACGG'PGGCGAAATGAGCGCTCACCA 11520 

YFRDFAYCDSGMIPWLLVAE 
TTACTTCCGTGATTTCGCTTACTGCGACAGCGGCATGATCCCGTGGCTGCTGGTCGCCGA 11580 

LVCLKGKTLGEMVRDRMAAF 
ACTGGTGTGCCTGAAAGGAAAAACGCTGGGCGAAATGGTGCGCGACCGGATGGCGGCGTT 11640 

PASGEINSKLAQPVEAINRV 
TCCGGCAAGCGGTGAGATCAACAGCAAACTGGCGCAACCCGTTGAGGCAATTAATCGCGT 11700 

EQHFSREALAVDRTDGISMT 
GGAACAGCATTTTAGCCGCGAGGCGCTGGCGGTGGATCGCACCGATGGCATCAGCATGAC 11760 

FADWRFNLRS SNTEPVVRLN 
CTTTGCCGACTGGCGCTTTAACCTCCGCTCCTCCAACACCGAACCGGTGGTGCGGTTGAA 11820 

VESRGDVKLMEKKTKALLKL 
TGTGGAATCACGCGGTGATGTAAAGCTAATGGAAAAGAAAACTAAAGCTCTTCTTAAATT 11880 

End of orf 11 

L. S E * 

GCTAAGTGAG TGATTATTTACATTAATCATTAAGCGTATTTAAGATTATATTAAAGTAAT 11940 
GTTATTGCGGTATATGATGAATATGTGGGCTTTTTTATGTATAACGACTATACCGCAACT 12000 

Start of H- repeat 

TTATCTAGGAAAAGATTAATAGAAATAAAGTTTTGTACTGACCAATTTGCATTTCACGTC 12060 

ACGATTGAGACGTTCCTTTGCTTAAGACATTTTTTCATCGCTTATGTAATAACAAATGTG 12 12 0 

CCTTATATAAAAAGGAGAACAAAATGGAACTTAAAATAATTGAGACAATAGATTTTTATT 1218 0 

ATCCCTGTTTACGATATTATAGCCAAAGTTGTATCCTGCATCAGTCCTGCAATATTTCAC 12240 

GAGTGCTTTGTTAACTGAATACATGTCTGCCATTTTCCAGATGATAACGACGTCATCGCA 12300 

ATTGATGGTAAAACACTTCGGCACACTTATGACAAGAGTCGTCGCAGAGGAGTGGTTCAT 12360 
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GTCATTAGTGCG-TTTCAGCAATGCACAGTC-TGGTCCTCGGATAGATC AAGACGGATGAGA 1242 0 

AACCTAATGCGTTCACAGTTATTCATGAACTTTCTAAAATGATGGGTATTAAAGGAAAAA 12480 

TAATCATAACTGATGCGATGGCTTGCCAGAAAGATATTGCAGAGAAGATATAAAAACAGA 12540 

GATGTGATTA1TTATTCGCTGTAAAAGGAAATAAGAGTCGGCTTAATAGAGTCTTTGAGG 12600 

AGATATTTACGCTGAAAGAATTAAATAATCCAAAACATGACAGTTACGCAATTAGTGAAA 12660 

AGAGGCACGGCAGAGACGATGTCCGTCTTCATATTGTTTGAGATGCTCCTGATGAGCTTA 12720 

TTGATTTCACGTTTGAATGGAAAGGGCTGCAGAATTTATGAATGGCAGTCCACTTTCTCT 12780 

CAATAATAGCAGAGC AAAAGAAAGAATCCGAAATGACGATCAAATATTATATTAGATCTG 12840 

CTGCTTTAACCGCAGAGAAGTTCGCCACAGTAAATCGAAATCACTGGCGCATGGAGAATA 1290 0 

AGTTGCACAGTAGCCTGATGTGGTAATGAATGAAATCGACTATAATATAAGAAGGCGAGT 12960 

TGCATTCGAATGATTTTCTAGAATGCGGCACATCGCTATTAATATCTGACAATGATAATG 13020 

TATTCAAGGCAGGATTATCATGTAAGATGCGAAAAGCAGTCATGGACAGAAACTTCCTAG 13080 

CGTCAGGCATTGCAGCGTGCGGGCTTTCATAATCTTGCAT 2^^TTTGATAAGATATTTC 13 140 
Start of or£12 

mmt _ MNLYGI FGAG SYGRE 

TTTGGAGATGGGAAAATSAATTTGTATGGTATTTTTGGTGCTGGAAGTTATGGTAGAGAA 13 200 

TIPILKQQIKQECGSDYALV 

ACAATACCCATTCTAAATCAACAAATAAAGCAAGAATGTGGTTCTGACTATGCTCTGGTT 13260 

JL^ DDVLAGKKVNGFEVLSTN 

TTTGTGGATGATGTTTTGGCAGGAAAGAAAGTTAATGGTTTTGAAGTGCTTTCAACCAAC 13320 

LKAPYLKKYFNVAIANDK 

TGCTTTCTAAAAGCCCCTTATTTAAAAAAGTATTTTAATGTTGCTATTGCTAATGATAAG 13380 

IRQRVSESXLLHGVEPITIK 

ATACGACAGAGAGTGTCTGAGTCAATATTATTACACGGGGTTGAACCAATAACTATAAAA 13440 

HPNSVVYDHTMIGSGAI ISP 

CATCCAAATAGCGT1X3TTTATGATCATACTATGATAGGTAGTGGCGCTATTATTTCTCCC 13500 

JLJL TISTNT HI«5RFFHANIYS 

TTTGTTACAATATCTACTAATACTCATATAGGGAGGTTTTTTCATGCAAACATATACTCA 13560 

YVAHDCQIGDYVTFAPGAKC 

TACGTTGCACATGATTGTCAAATAGGAGACTATGTTACATTTGCTCCTGGGGCTAAATGT 13620 

NGYVVIEDNAYIGSGAVIKQ 

AATGGATATGTTGTTATTGAAGACAATGCATATATAGGCTCGGGTGCAGTAATTAAGCAG 13680 

^S m VPNRPLI I G A G A I IGMGAV 

GGTGTTCCTAATCGCCCACTTATTATTGGCGCGGGAGCCATTATAGGTATGGGGGCTGTT 13740 

VTKSVPAGITVCGNPAREMK 

GTCACTAAAAGTGTTCCTGCCGGTATAACTGTGTGCGGAAATCCAGCAAGAGAAATGAAA 13 800 

End of orf 12 

R S P T S I * 

AGATCGCCAACATCTATT TAATGGGAATGCGAAAACACGTTCCAAATGGGACTAATGTTT 13 860 
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AAAATATATAT^TTTCGCTAATTTACTAAATTATGGCTTCTTTTTAAGCTATCCTTTAC 13920 
TTAGTTATTACTGATACAGCATGAAATTTATAATACTCTGATACATTTTTATACGTTATT 13980 
CAAGCCGCATATCTAGCX3GTAACCCCTGACAGGAGTAAACAATG 14024 
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GTTGACAAATACCGACCGTATAATGAATCAAACGTTCTGGATTGGTATTTATCCAGGCTT 6 0 

GACTACAGAGCATTTAGATTATGTCGTAAGTAAGTTTCAAGAATTTTTTGGTTTAAATTT 120 

Start of aJbe 

MLDVNKKILMTGAT 
CTAATTTTTAGGATAGG A^GCTTGATGTGAATAAGAAAATCCTAATGACTGGCGCTACTA 180 

SFVGTHLLHSLIKEGYSI IA 
GCTTTGTAGGTACCCATCTACTACATAGTCTCATAAAGGAAGGTTATAGTATTATTGCAT 240 

LKRPITEPTI INTLIEWLNI 
TAAAGCGTCCTATAACCGAGCCAACGATTATCAATACCTTGATTGAATGGTTGAATATAC 300 

QDIEKICQS SMNIHAIVHIA 
AAGATATAGAAAAAATATGTCAATCATCTATGAATATTCATGCGATTGTCCATATTGCAA 360 

TDYGRNRTPISEQYKCNVLL 
CAGACTATGGTCGAAACAGAACCCCTATATCTGAACAATATAAATGTAATGTCCTATTAC 420 

PTRLLELMPALKTKFFISTD 
CAACAAGACTGCTTGAGTTAATGCCAGCGCTTAAAACGAAATTCTTTATTTCTACTGACT 480 

SFFGKYEKHYGYMRSYMASK 
CTTTTTTTGGGAAATATGAGAAGCACTATGGATATATGCGTTCTTACATGGCATCTAAAA 540 

RHFVELSKIYVE EHPDVCFI 
GACATTTTGTAGAACTATCAAAAATATACGTAGAGGAACATCCAGACGTTTGTTTTATAA 600 

NLRLEHVYGERDKAGKI IPY 
ATTTACGTTTAGAACATGTTTACGGTGAGAGGGATAAAGCAGGTAAAATAATCCCGTATG 660 

VIKKMKNNEDIDCTIARQKR 
TTATCAAAAAAATGAAAAACAATGAAGATATTGATTGTACGATCGCCAGGCAGAAAAGAG 720 

DFIYIDDVVSAYLKILKEGF 
ATTTTATTTATATAGACGATGTTGTTTCGGCCTATTTGAAAATTTTAAAGGAGGGTTTTA 7 80 

NAGHYDVEVGTGKSIELKEV 
ACGCTGGACACTATGATGTCGAGGTGGGGACTGGAAAATCGATAGAGCTAAAAGAAGTGT 840 

FEIIKKETHSSSKINYGAVA 
TTGAGATAATAAAAAAAGAAACGCATAGTAGTAGTAAGATAAATTATGGTGCAGTTGCGA 900 

MRDDEIMESHANTSFLTRLG 
TGCGTGATGATGAGATTATGGAGTCACATGCAAATACCTCTTTCTTGACTCGATTAGGTT 960 

End of aba Start of wzx 

M 

WSAEFSIEKGVKKMLSMKE* 
GGAGTGCCGAGTTTTCTATTGAGAAGGGTGTGAAAAAAATGTTGAGTATGAAAGAG TAAT 1020 

NRI IRMLGVDKAIRYVIFGK 
GAATCGTATTATTAGAATGTTAGGTGTAGATAAAGCAATTCGTTATGTTATTTTTGGTAA 1080 

^.™ ISVLTGLLLIMLI SHHLSK 
GATAATATCTGTATTAACGGGTTTACTGTTAATAATGTTAATATCACACCATTTATCTAA 1140 

DAQGYYYTFNSVVALQI IFE 
AGACGCACAGGGCTATTATTATACATTTAATTCAGTAGTGGCACTACAGATAATATTTGA 1200 
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LGLSTVI IQFASHEMSALKY 
ATTGGGGCTATCAACGGTAATCATTCAATTCGCTAGCCATGAAATGTCAGCGTTAAAATA 1260 

DYSERDI IGESKNKQRYLSL 
TGATTATTCTGAACGAGATATTATAGGTGAAAGTAAAAATAAGCAACGTTACCTATCGTT 132 0 

FRLAIKWYAVIALLIILIVG 
ATTTCGGTTGGCAATAAAATGGTATGCAGTAATAGCTTTGCTAATAATATTAATAGTCGG 1380 

PIGYVFFTQKEGLGVPWQGA 
TCCCATCGGGTATCTTTTTTTTACGCAAAAAGAAGGCTTAGGTGTACCTTGGCAAGGGGC 1440 

WLLLTIVTAFNIFLVSVLSV 
ATGGTTATTATTAACAATAGTTAC AGCTTTTAATATTTTTCTTGTTTCTGTACTTTCTGT 1500 

AEG SGLITDVNKMRMYQSL L 
CGCTGAAGGGAGTGGGTTAATTACTGATGTGAATAAAATGAGAATGTATCAGTCGCTGTT 1560 

AGILAVSLLISGFGLYATSA 
AGCTGGTATATTGGCAGTAAGCTTACTTATTAGTGGCTTTGGACTATATGCTACGTCTGC 1620 

IAISGTIIFSIFSYKYFKKI 
AATAGCTATTTCAGGGACTATCATATTCTCCATATTTTCATATAAGTATTTTAAAAAAAT 1680 

FLQSLKHKNKYTEGGI SWVN 
TTTCCTGCAATCTTTAAAGCATAAAAATAAATATACTGAAGGTGGTATTTCATGGGTTAA 1740 

EIFPMQWRIALSWMSGYFIY 
TGAAATATTTCCTATGCAATGGCGAATTGCTCTAAGTTGGATGTCAGGGTATTTTATTTA 1800 

FVMTPIAFKYFGAIYAGQLG 
TTTTGTTATGACCCCCATTGCATTC AAATATTTCGGGGCTATATATGCAGGGCAGTTAGG 1860 

MSLTLCNMVMATGLAWISTK 
GATGTCTTTAACATTATGCAATATGGTAATGGCTACGGGCCTGGCTTGGATATCC ACTAA 1920 

YPKWGVMVSKKQLAELSKSF 
ATATCC AAAATGGGGAGTAATGGTTTCC AACAAACAGCTTGCGGAACTGAGTAAATCGTT 1980 

KSAVMQS SFFVLTGLTGVYI 
CAAAAGTGCAGTAATGCAATCATCCTTTTTTGTCTTGACAGGATTAACTGGTGTATACAT 2040 

SLWLLKLSGSNIGERFLGLQ 
TTCATTATGGTTATTGAAATTATCTGGTTCAAACATTGGCGAGCGGTTTTTGGGATTGCA 2100 

DFFFLSLAI IGNHIVAC FAT 
GGATTTTTTCTTTTTATCTTTAGCAATTATTGGTAATCACATTGTAGCTTGCTTTGCAAC 2160 

YIRAHKTEKMTLASCIMAL L 
CTATATAAGAGCGCATAAAACTGAAAAAATGACATTGGCATCATGTATAATGGCTCTCTT 2220 

TITTMLFVAYLEYSRFYMLM 
GACTATAACTACAATGTTGTTTGTTGCATATTTAGAGTACTCGAGGTTCTACATGTTAAT 2280 

YAALTWLYFVPQTYI IFKRF 

S L K D 

GTATGCAGCACTAACGTGGTTATATTTTGTTCCTCAAACTTATATAATCTTTAAAAGATT 2340 
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Start of wbttR End of wzx 



2400 

SSCLARL LDS I IQQENYCHD 
CTTCATGTTTGGCTCGTTTACTTGATAGTATAATTC AACAGGAGAACTATTGTC ATGATG 2460 

ELEVIVCDNASTDETARIAK 
AACTCGAGGTTATTGTTTGTGATAATGCTTCAACAGATGAAACAGCAAGAATAGCC AAGA 2520 

SGLDKIRNSTYHLNE ENLGM 
GTGGCTTAGATAAAATAAGAAATAGTACTTATCATCTAAATGAAGAAAACTTAGGAATGG 2580 

DGNF QKC FEL SNGKYLWMI G 
ATGGTAACTTCCAGAAATGTTTTGAGTTATCAAATGGAAAATATCTTTGGATGATTGGCG 2640 

DDDLIVKNGI SKVFSILKSR 
ATGATGATCTAATAGTCAAAAATGGTATTTCGAAGGTTTTTTCGATATTAAAGTCCCGGC 2700 

PALDMVYVNSA AKTELNYNA 
CTGCATTAGATATGGTGTATGTAAATTCAGCAGCAAAGACTGAGTTAAACTATAATGCTG 2760 

DVRTSFYTNDVDFISDVKVM 
ATGTGAGGACGTCATTCTACACAAATGATGTAGATTTTATTTCAGACGTGAAAGTTATGT 2 82 0 

FTFI SGMICK KTDAIVKAVG 
TCACGTTTATTTCTGGAATGATATGTAAGAAAACTGATGCAATTGTCAAAGCCGTTGGTA 2880 

IFSPQTTGKYLMHLTWQLPL 
TTTTCAGTCCGCAAACTACTGGAAAATATCTTATGCATTTAACATGGCAATTGCCATTAC 2940 

LKQ GGEFAVIHNNIIEAEPD 
TTAAACAGGGTGGAGAGTTCGCAGTTATCCATAATAATATAATTGAGGCTGAGCCAGATA 3 000 

NSGGYHLYKVF SNNLATI FD 
ATTCAGGTGGATATCATTTATATAAGGTTTTTTCTAATAATCTTGCGACAATCTTTGATG 3060 

VFYPREHRVSKRVRASACLF 
TTTTTTATCCCAGAGAGCACCGTGTAAGTAAAAGAGTTCGCGCATCAGCATGTTTATTCT 3120 

LLNFIGDEDKTKNFATNNYL 
TACTTAACTTCATAGGCGATGAAGATAAAACCAAAAATTTTGCTAGAAATAATTATTTAA 3180 

RDCDSAFIDLI IYKYGLRF F 
GAGATTGCGATAGTGCATTTATAGATTTAATTATATATAAATATGGGCTTAGGTTTTTCT 3240 

YLYPKTVPLFRKIKYI IKTV 
ATCTATATCCTAAAACTGTGCCTTTATTTAGAAAAATAAAATATATTATAAAGACGGTTT 3300 

End of wbaR 

L M R K * 

TAATGCGGAAATAAAAATTATTCAAGATGGTTTGCTGAAAACGACTTATAGGACTATCTA 3360 
Start of wbaL 

MFVYSLRLKLNLI ISLLSKV 
ATSTTTGTCTATAGTTTAAGATTAAAATTAAATCTTATCATATCATTATTGAGTAAAGTT 3420 

RRKSKAKFLVL LSGYDFKMV 
AGGCGGAAATCAAAAGCAAAGTTTCTTGTTCTGCTTAGCGGATATGATTTTAAAATGGTT 3480 
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GKNFKLNVKPYSAKN NTSSK 
GGGAAGAATTTTAAATTGAATGTCAAACCTTACTCTGCAAAAAATAACACCTCTTCCAAA 3540 

WGSMRVGDNCWIEAVYNYGD 
TGGGGTAGTATGCGGGTTGGTGATAACTGCTGGATTGAAGCTGTATATAATTATGGTGAT 3 60 0 

EKFEPYLYIGDRICL SDNVH 
GAAAAATTTGAACCTTATTTGTACATAGGTGATCGTATATGTTTAAGTGATAATGTTCAT 3660 

ISCVSCLILENDILIGSKVY 
ATTTCTTGCGTATCATGTTTAATTTTAGAAAACGATATATTAATTGGTAGCAAAGTTTAT 3720 

IGDHSHGSYKVCSPKIEP PA 
ATAGGCGATC ATAGCCATGGCAGTTATAAAGTATGCAGTCCGAAAATAGAACCGCCAGCA 3780 

NKPLGDIAPIKIGNC CWIGD 
AATAAGCCATTAGGTGATATTGCTCCTATTAAAATAGGTAATTGCTGCTGGATTGGAGAT 3840 

NAVILAGSEICDGCVIAANS 
AATGCAGTAATTCTGGCTGGTAGTGAAATTTGTGATGGCTGTGTAATCGCAGCTAATTCA 3900 

VVKDLKVDKPCLIG GVPAKV 
GTCGTCAAGGATTTAAAAGTCGATAAGCCATGTTTAATTGGTGGGGTTCCTGCTAAAGTA 3960 

End of vrbaL Start of wb&Q 

I K V F * 

MNVFISICIPSYNRA 
ATAAAGGTATTT1TAAAATGAATGTTTTTATCAGTATTTGTATACCGTCTTATAATAGAGC 4020 

EFLEPLLDS IYNQDYCLKNN 
TGAGTTTTTAGAGCCACTACTGGATAGCATATATAATCAAGATTATTGTTTAAAGAATAA 4080 

DFEVIVCEDKSPQRDEINS I 
TGATTTTGAGGTCATTGTTTGTGAAGATAAATCTCCACAGAGAGATGAGATAAACTCTAT 4140 

IENYKAKNNKQNLYVNFNED 
TATCGAAAACTATAAAGCAAAAAATAATAAACAAAATCTTTATGTTAATTTCAATGAAGA 4200 

NLGYDKNLKKCISLTTGKYC 
TAATTTAGGCTATGATAAGAATTTAAAAAAATGCATTAGTTTGACGACAGGTAAATATTG 4260 

MIMGNDDLLADGALSKIVKV 
CATGATCATGGGCAACGATGATCTATTAGCAGATGGAGCGTTATCAAAAATAGTGAAAGT 4320 



r«iiijCDTVRHLTD DTLFQPG 
TCCGAATGAGTTATGTGATACTGTTCGTCATTTAACAGACGATACTTTATTTCAGCCGGG 

ADAIKF FFRRVGVI SGFIVN 
GGCTGATGCCATTAAATTTTTCTTCCGTAGAGTTGGAGTTATTTCAGGCTTTATTGTCAA 

AEKAKKLSSDLFDGRLYYQM 
TGCTGAAAAAGCAAAAAAACTATCGAGTGATTTATTTGATGGGCGTTTATATTATCAAAT 

YLAGMLMAEGQGYYFSDVMT 
GTACCTTGCTGGTATGCTAATGGCTGAAGGTCAGGGATACTATTTTAGCGACGTGATGAC 
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LSRDTEAPDFGNAGTEKGVF 
ATTGTCGAGGGATACAGAGGCTCCTGACTTTGGTAACGCTGGAACTGAAAAAGGAGTTTT 4680 

TPGGYKPEGRIHMVEGLLL I 
CACCCCGGGGGGGTATAAACCAGAGGGCCGTATACATATGGTTGAAGGCTTGTTGCTAAT 4740 

AKYIEDTTKIDGVYAGIRKD 
TGCAAAATATATAGAAGATACAACAAAAATTGATGGCGTTTATGCTGGAATTAGAAAAGA 4800 

LANYFYPY IRDQLDL PLYTY 
CTTAGCGAACTATTTTTATCCTTATATTCGAGATCAACTCGACTTGCCTCTTTATACTTA 4860 

IKMINKFRKMGFSNEKLFYV 
TATTAAAATGATAAATAAATTTCGGAAAATGGGATTTTCAAATGAAAAGCTTTTCTATC 4920 

HAFLGYVLKRRGYDALIKYI 
GCATGCCTTTTTAGGGTATGTACTAAAACGGAGGGGCTATGATGCTTTAATTAAATACAT 4980 

End of wbaQ 



5040 

TATGAATATACTTCTTGCTGCGATATTAGGCGCTAACTTATTTTCTCCATATATTAGTTC 5100 
Start of wzy 

MLPFP PGAILRDVLNV 
GTGGATGGTGGGTATCCTGCCATTTCCACCAGGAGCAATCCTAAGGGATGTACTCAATGT 5160 

5220 

LVFTIFSWSAVILWVIALTI 
GTTGGTTTTTACTATTTTTTCATGGTCGGCGGTAATACTATGGGTAATAGCGTTAACTAT .5280 

FSPDKIQAIMGGRSYILFPA 
ATTC TC AC CGG ATAAAATTCAAGC AATTATGGGGGGGCGGAGTTAT AT TTT ATTCCCGGC 5340 

VFIALVILKVSYPQSLNIEK 
AGTTTTCATAGCATTAGTGATTTTAAAAGTATCATACCCGCAATCCTTAAATATTGAAAA 5400 

IVCYI IFLMFMVATISI IDV 
AATAGTTTGCTACATAATTTTTCTAATGTTTATGGTTGCGACAATATCTATTATTGATGT 5460 

LMNGE F IKL LGYDE HYAGEQ 
ACTAATGAATGGAGAGTTCATTAAATTGCTCGGATATGATGAGCATTATGCAGGAGAACA 5520 

LNLINSYDGMVRATG GFSDA 
ATTAAACTTAATTAATAGCTATGATGGGATGGTCCGGGCTACAGGCGGTTTTAGTGATGC 5580 

LNFGYMLTLGVL LCMECFSQ 
TCTCAATTTTGGATATATGCTCACATTAGGTGTTTTGTTATGTATGGAGTGTTTTTC 5640 

GYKRLLMLI I " S FVLFIAICM 
AGGATATAAAAGATTATTGATGCTTATTATTAGTTTTGTGCTATTTATAGCGATCTGCAT 5700 

™~ L TRGAII -VAALIYALYI IS 
GAGTCTTACTAGAGGAGGAATACTTGTTGCTGCGCTTATTTACGCACTTTATATAATTTC 5760 

N R K M L F C GITLFVIIIPVLA 
AAATCGGAAGATGCTTTTTTGTGGAATAACTTTATTTGTAATAATTATACCCGTTTTAGC 5820 
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ISTNIFDNYTElLIGRFTDS 
AATTTCTACTAATATTTTTGACAACTATACAGAAATTTTGATCGGCAGGTTTACAGATTC 5880 

SQASRGSTQGRIDMAINSLN 
GTCTCAGGCATCGCGTGGATCTACAC AGGGGCGGATAGATATGGCAATTAATTCATTAAA 5940 

FLSEHPSGIGLGTQGSGNML 
CTTCCTGTCAGAACATCCATCAGGTATAGGTCTGGGTACTCAAGGTTCAGGAAACATGCT 6000 

SVKDNRLNTDNYF FWIALET 
TTCGGTAAAAGATAATAGGTTAAATACGGATAATTATTTTTTCTGGATCGCCCTTGAGAC 606 0 

GIIGLI INI IYLASQFYSST 
TGGTATTATTGGCTTAATC ATAAATATTATTTATCTGGC AAGTCAATTTTATTCTTCAAC 6120 

LLNRIYGSHC SNMHYRLYFL 
TTTACTAAATAGAATATATGGCAGTCATTGTAGCAATATGCACTATAGATTATATTTTCT 6180 

FGSIYFISAALSSAPSSSTF 
COTTGGAAGTATATATTTTATAAGTGCAGCGTTAAGTTCAGCACCTTCGTC ATCAACTTT 6240 

SIYYWTVLALIPFLKLTNRR 
TTCTATATATTATTGGACAGTTTTAGCTTTGATTCC ATTTTTAAAATTAACAAATAGACG 6300 

End of wssy Start of wbaW 

CTR *MNNK KVLMiD I , S W S NKG 
GTGCACGCGA TAATG AAT AATAAAAAGGTTTTG ATGGATATTAGTTGGTCTAATAAAGGG 6360 

GIGRFTDEISKL.LCDI SKEE 
GGGATTGGACGTTTTACTGATGAAATTTCTAAACTACTATGTGATATATCTAAGGAGGAA 6420 

LYRKCASPLA PLGLAVNIFL 
CTATATAGAAAATGTGCTTCTCCGCTGGCCCCATTAGGTTTAGCAGTC AATATTTTTCTG 6480 

RKKTDVVFLPGYIPPLFCSK 
CGAAAGAAAACTGATGTGGTTTTTCTTCCTGGCTATATTCCACCACTTTTTTGTTCGAAA 6540 

KFIITIHDLNHLDLNDNSSL 
AAGTTCATAATAACAATACATGATCTAAATCATCTGGATTTAAATGATAATTCCTCTCTT 6600 

FKRLFYNFI IKRGCRKAYKI 
TTTAAGAGGTTATTTTATAATTTTATAATAAAGCGCGGTTGTAGAAAAGCATATAAAATA 6660 

FTVSNFSKERIVAWSGVNPN 
TTTACAGTTTCGAATTTTTCAAAAGAAAGAATAGTAGCATGGTCAGGTGTAAACCCTAAT 6720 

KIVTVYNGVS SLFNADVKPL 
AAAATAGTCACGGTATATAATGGGGTATCTAGTCTATTTAATGCCGATGTAAAACCATTG 6780 

NLGYKYL LCVGNRKTHKNEK 
AATTTAGGCTATAAATATTTGCTATGTGTAGGAAACAGAAAAACTCATAAGAATGAGAAG 6 84 0 

CVISAFAKADIDPSIKLVFT 
TGTGTTATATCTGCCTTTGCC AAAGCAGATATTGATCCATCAATAAAACTCGTTTTTACT 6900 

GNPCNDLEKLI IQHGLSERV 
GGTAATCCTTGTAATGATTTAGAAAAACTAATAATACAACATGGTTTAAGTGAACGTGTA 6960 
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AAGTTCTTTGGGTTCGTGTCTGAAAAAGATTTACCATCGTTATATAAGGGCTCGTTAGGA 7020 

LVFPSLYEGFGLPVVEGMAC 
TTAGTTTTCCCTTCTTTATATGAAGGTTTTGGATTACCTGTAGTGGAGGGCATGGCCTGT 7080 

GIPVLTSDTS SLPEVAGDAA 
GGTATTCCTGTATTAACTTCTCTAACTTCATCATTGCCAGAGGTGGCTGGAGATGCAGCG 7140 

ILVDPLSEDAITKGISR LIN 
ATTCTTGTCGACCCTCTTTCGGAAGATGCTATTACTAAAGGAATTTCGAGGTTAATTAAT 7200 

DS ELRKHL I Q KGL LRAKRFN 
GATTCTGAACTTCGTAAGCATTTAATCCAAAAGGGGCTTTTGCGGGCAAAGAGGTTCAAT 7260 



W Q N V 



Start of wbaZ 



TGGCAAAACGTGGTTAGTGAGATTGAAATGGTACTGACAGAGGCATGTGATSGAAATAAA 



End of vbaW 



rGAAATAAAAATATCTCTCGTTCATGAGTGGTTATTAAGTTATGCAGGCTCCGAACAGGT 73 80 

SSAILHVF PEAKLYSVVDFL 
ATCATCTGCCATCCTGCATGTTTTTCCTGAAGCGAAGTTATATTCGGTGGTTGATTTTCT 7440 

TDEQRRHFLGKYATTTFIQN 
AACGGATGAACAAAGAAGACATTTTCTGGGGAAATATGCGACTACCACATTTATTCAAAA 7500 

LPKAKKFYQKYLPLMPLAIE 
TTTACCTAAAGCTAAAAAATTTTACCAGAAATATTTACCACTAATGCCACTGGCTATTGA 7560 

QLDLSDANI IISSAHSVAKG 
ACAACTTGATTTATCAGATGCTAATATCATCATTAGTAGCGCCCATTCCGTTGCAAAAGG 7620 

„JL* ? GPDQLHISYVHSPIRYA 

TGTTATTTCCGGACCAGATCAGCTTCACATTAGCTATGTTCATTCTCCTATTCGATATGC 7680 

WDLQHQYLNE SNLNKG I KGW 
GTGGGATTTACAGCATCAGTACCTTAATGAGTCTAACCTGAATAAAGGAATTAAAGGTTG 7740 

LAKWL LHKI R IWDSRTANGV 
GTTAGCAAAATGGCTTCTTCACAAAATACGAATTTGGGATTCTCGAACCGCAAATGGGGT 7800 

DHFIANSQYIARRIKKVYRR 
TGATCATTTTATAGCTAATTCTCAATATATCGCGCGTAGAATTAAAAAAGTATACAGACG 7860 

EASVIYP PVDVDNFEVKNEK 
TGAGGCTTCAGTTATATATCCGCCTGTAGATCTGGATAATTTTGAAGTAAAAAATGAAAA 7920 

QDYYFTASRMVPYKRIDLIV 
GCAAGACTATTATTTCACAGCATCCCGTATGGTACCCTACAAACGTATTGATCTTATTGT 7980 

EAFSKMPEKKLVVIGDGPEM 
CGAAGCCTTTAGTAAAATGCCGGAAAAGAAATTAGTAGTTATTGGTGATGGACCGGAGAT 8040 

KKIKSKATDNIKLLGYQS FP 
GAAAAAAATAAAGAGCAAGGCTACAGACAATATAAAATTGCTCGGTTATCAATCTTTTCC 8100 
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VLKEYMQSARAFVFAAEEDF 
TGTTTTAAAAGAGTATATGCAGAGCGCCAGGGCGTTTGTTTTTGCAGCGGAAGAGGACTT 8160 

GI I PVEAQACGTPVIAFGKG 
TGGAATAATACCTGTCGAAGCTCAAGCTTGCGGTACCCCTGTTATTGCCTTTGGGAAGGG 8220 

GALETVRPLGVEEPTGIFFK 
TGGGGCCTTAGAAACCGTTCGCCCACTAGGTGTAGAGGAACCGACTGGCATTTTCTTCAA 8280 

EQNIASLHEAVSEFEKNASF 
GGAACAGAATATTGCTTCTTTGCATGAAGCTGTTAGTGAATTTGAAAAAAATGCATCATT 8340 

FTSQACRKNAEKFSRSRFEQ 
TTTTACATCTCAGGCTTGTAGAAAAAATGC AGAAAAATTTTCTCG ATC AAGATTTGAACA 8400 

EFKNFVNEKWNLFKTEQI IK 
AGAATTTAAGAACTTTGTTAATGAAAAGTGGAATCTTTTCAAAACAGAACAGATTATTAA • 8460 

End of wbaZ Start of mazxC 

MSKLIPVIMAG GI 

R * 

ACGTrAATTATGGTTTATTGAATCTCTAAATTAATACCAGTAATAATGGCCGGTGGGATT 8520 

GSRLWPLSRE EHPKQFLSVD 
GGTAGCCGTTTGTGGCCACTTTCACGTGAAGAGCATCCGAAACAGTTTTTAAGCGTAGAT 8580 

GGTGAA LSM3jQNTIKR1 ' TPLL,AGE 

PLVICNDSHRFLVAEQLRAI 
CCTTTAGTC ATTTGTAATGATAGTCACCGCTTCCTTGTCGCTGAAC AACTTCGAGCTATA 8700 

NKLANNI ILEPVGRNTAPAI 
AATAAACTAGCAAATAACATCATATTAGAGCCAGTGGGGCGTAATACAGCCCCAGCTATA 8760 

ALAAFCSLQNVVDEDPLLLV 
GCGCTGGCCGCTTTTTGTTCACTTCAGAATGTCGTCGATGAAGACCCGCTTTTGCTTGTC 8820 

LAADHVIRDEKVFLKAINHA 
CTTGCTGCGGATCATGTCATCCGCGATGAGAAAGTGTTTCTTAAAGCTATCAATCACGCT 8880 

EFFATQGKLVTFGIVPTQAE 
GAATTTTTTGCAACACAAGGTAAGCTAGTAACGTTTGGTATTGTACCCACACAGGCCGAA 8940 

TGYGYICRGEAIGEDAFSVA 
ACTGGCTACGGTTATATTTGTAGAGGTGAAGCAATCGGGGAAGATGCTTTTTCTGTAGCC 9000 

EFV. EKPDFDTARHYVESEKY 
GAATTTGTAGAGAAGCCTGATTTCGATACAGCGCGTCATTATGTAGAATCAGAGAAATAT 9060 

YWNSGMFLFRAS SYLQELKD 
TATTGGAACAGCGGTATGTTCCTATTTCGTGCAAGTAGTTACTTACAAGAATTAAAGGAT 9120 

LSPDIYQACENAVGSINPDL 
CTGTCCCCCGATATTTACCAAGCATGTGAAAATGCGGTAGGGAGTATTAATCCTGATCTT 9180 

DFIRIDKEAFAMCPSDSIDY 
GATTTTATCCGTATTGATAAAGAAGCATTCGCAATGTGCCCTAGTGATTCTATCGATTAT 9240 
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AVMEHTRHAVVVPMNAGWSD 
GCGGTAATGGAACATACTAGGCATGCAGTTGTCGTACCGATGAATGCCGGCTGGTCAGAT 9300 

VGSWSSLWDISKKDPQRNVL 
GTGGGGTCATGGTCTTCACTGTGGGATATTTCTAAGAAAGATCCACAACGTAATGTATTA 9360 

HGDIFAYNSKDNYIYSEKSF 
CATGGCGATATTTTTGCATATAATAGTAAAGATAATTATATCTATTCTGAAAAATCGTTT 9420 

ISTIGVNNLVIVQTADALLV 
ATTAGTACAATCGGAGTAAATAATTTAGTTATCGTGCAGACAGCAGATGC ATTATTAGTA 9480 

SDKDSVQDVK KVVDYLKANN 
TCTGATAAAGATTCAGTCCAGGATGTTAAAAAAGTTGTTGATTATTTAAAAGCTAATAAT 9540 

RNEHKKHLEVFRPWGKFSVI 
AGAAACGAACATAAAAAACATTTAGAGGTTTTCCGACCGTGGGGAAAATTTAGCGTAATT 9 6 0 0 

HSGDNYLVKRITVKPGAKFA 
CATAGTGGCGATAATTATTTAGTTAAAAGAATAACTGTTAAACC AGGCGCGAAGTTTGCT 9660 

AQMHLHRAEHWIVVSGTACI 
GCTCAGATGCATCTCCATCGTGCTGAGCATTGGATAGTGGTATCTGGTACTGCTTGTATT 9720 

TKGEEIFT.ISENESTFIPAN 
ACTAAGGGGGAAGAAATTTTTACAATTTCGGAGAATGAATCAACATTTATACCTGCTAAT 9780 

TVHTLKNPAT I PLELIEIQ S 
ACAGTTCATACGTTAAAAAACCCCGCGACTATTCCATTAGAACTAATAGAAATTCAATCT 9840 

GTYLAEDDI IRLEKHSGYLE 
GGCACCTATCTTGCGGAGGATGATATTATTCGCCTGGAGAAACATTCTGGATATCTGGAG 9900 

End o£ manC Start of 

MKNIYNTYDVINKSGIN 
TAATGAATTGATGAAAAATATATATAATACTTACGATGTTATCAACAAATCTGGAATTAA 9960 

FGTSGARGLVTDFTPEVCAR 
TTTTGGAACCAGTGGTGCCCGCGGCCTTGTTACCGATTTTACACCCGAAGTTTGCGCACG 10020 

FTISFLTVMQ QRFSFTTVAL 
ATTTACCATTTCCTTTTTGACAGTAATGCAGCAAAGATTCTCATTTACAACGGTTGCGCT 10080 

AIDNRPS SYAMAQACAAALQ 
CGCAATTGATAATCGTCCAAGCAGTTACGCGATGGCTCAAGCTTGTGCCGCTGCTTTGCA 10140 

EKGIKTVYYGVIPTPALAHQ 
AGAAAAAGGAATTAAAACCGTTTACTATGGCGTAATTCCAACACCTGCTTTAGCTCATCA 10200 

SISDKVPAIMVTGSKIPFDR 
ATCAATTTCCGATAAAGTACCTGCAATCATGGTTACTGGCAGTCATATCCCTTTTGACCG 10260 

NGLKFYRPDGEITKD DENAI 
TAATGGCCTGAAATTTTATAGACCAGATGGTGAAATTACTAAAGATGATGAGAATGCTAT 1032 0 

IHVDASFMQ PKLEQLTISTI 
TATTCATGTTGATGCCTCATTTATGCAGCCTAAGCTTGAACAATTGACAATTTCCACAAT 10380 
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CGCTGCTAGAAATTATATTCTACGATATACCTCATTATTTCCAATGCCATTCTTGAAAAA 10440 

KRIGIYEHS SAGRDLYKTLF 
TAAGCGCATTGGAATTTATGAGCATTCTAGTGCGGGTCGTGATCTCTATAAGACGTTATT 10500 

KMLGATVVSLARSDEFVPID 
CAAAATGTOGGGTGCTACAGTTGTTAGTTTAGCAAGGAGCGACGAATTTGTTCCTATTGA 10560 

TEAVSEDDRNKAITWAKKYQ 
TACTGAAGCTGTAAGTGAAGATGATAGAAATAAAGCAATCACATGGGCAAAAAAATATCA 10620 

LDAIFSTDGDGDRPLIADEY 
GTTAGATGCTATATTOTCAACTGATGGTGATGGAGATCGCCCTCTGATAGCTGACGAATA 10680 



10740 



TGCAGTCGCTATTCCTGTAAGCTGCAACAGTACAATCTCATCTGGTAACTTTTTTAAACA 10800 

VERTKIGSPYVIAAFAKLSA 
TGTGGAACGAACAAAGATTGGTTCACCCTATGTGATTGCAGCATTTGCTAAATTATCTGC 10860 



10920 



TTATATTAATCAGCGTTTACTTAAGGCATTACX:AACACGTGATGCTTTATTACCTGCCAT 10980 

MLLFGSKDKS ISELVK KLPA 
TATGCTTCTGTTTGGTAGCAAGGACAAAAGTATTAGTGAGCTTGTTAAAAAACTTCCTGC 11040 

RYTYSNRLQDI SVKTSMSL I 
TCGCTATACCTATTCAAACAGATTACAGGATATAAGTGTTAAAACAAGTATGTCTTTAAT 11100 

NLGLTDQEDFLQYIGFNKH h 
AAATCTTOGTCTGACAGATC^GAGGATTTTTTGCAGTAT 11160 

ILHSDVTDGFRITIDNNNI I 
TATATTACATTC^ATGTTACTGATGGCTTTAGAATCACTATCGATAACAACAATATTAT 1122 0 

HLRPSGNAPELRCYAEADSQ 
TCATTTACGACCTTCAGGCAATGCCCCTGAGTTGCGTTGCTATGCGGAGGCTGACTCGCA 11280 

EDACNIVETVLSNIKSKLGR 
AGAGGATGCATGTAATATTGTTGAAACTGTTCTCTCTAATATCAAAAGCAAACTGGGTAG 11340 

End of manB 

A * 

AGCT rAATGCTGTTGATAATAGAGCGTTTCTTTCCAGTAATACTTTGTCTGGTTATCTGG 11400 
Start of wba.P 

MDRFDNKYNPNL 
TACCCAAGTTGAGGGTGAGAATTAAM^ATCGTTTTGATAATAAGTATAACCCAAATTT 11460 

CKILLA ISDLLFFNVALWAS 
ATGCAAAATATTATTGGCTATATCAGATTTACTGTTTTTTAATGTAGCCTTATGGGCATC 11520 
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LiGVVYLIFDEVQRFVPQEQL 
GTTAGGAGTTGTATATTTAATCTTTGATGAAGTTCAGCG ATTTGTACCACAAGAGCAATT 11580 

DNRF1SHFILSIVCVGWFWV 
AGATAATCGATTTATATCACATTl^ATTCTATCTATAGTATGCGTTGGATGGTTTTGGGT 11640 

RLRHYTYRKPFWYELKEVIR 
TCGACTGCGTCACTATACATATCGAAAGCCATTCTGGTATGAGTTGAAAGAGGTTATTCG 11700 

TIVIFAVFDLALIAFTKWQF 
TACTATCGTTATTTTTGCTGTGTTTGATTTGGCTTTAATTGCGTTTAC AAAATGGCAGTT 11760 



ALTKHLLNKLGIWKKKTIIL 
CGCACTTACAAAGCATTTATTGAACAAGCTAGGTATCTGGAAGAAAAAAACTATCATCCT 11880 

GSGQNARGAYSALQSE EMMG 
TGGGAGCGGACAGAATGCTCGTGGTGCATATTCTGCGCTGCAAAGTGAGGAGATGATGGG 11940 

FDVIAF FDTDASDAE INMLP 
GTCTGATGTTATCGCTTTTTTTGATACGGATGCGTCAGATGCTGAAATAAATATGTTGCC 12000 

VIKDTETIWDLNRTGDVHYI 
GGTGATAAAGGACACTGAGACTATTTGGGATTTAAATCGTACAGGTGATGTCCATTATAT 12060 

LAYEYTELEKTHFWLRELSK 
CCTTGCTTATGAATACACCGAGTTGGAGAAAACACATTTTTGGCTACGTGAACTTTCAAA 12 120 

HHCRSVTVVPSFRGLPLYNT 
ACATCATTGTCGTTCTGTTACTGTCGTCCCCTCGTTTAGAGGATTGCCATTATATAATAC 12180 

DMSFI FSHEVMLLR I QNNLA 
TGATATGTCTTTTATCTTTAGCCATGAAGTTATGTTATTAAGGATACAAAATAACTTGGC 12240 

KRSSRFLKRTFDIVCSIMIL 
TAAAAGGTCGTCCCGTTTTCTCAAACGGACATTTGATATTGTTTGTTCAATAATGATTCT 12300 

IIASPLMIYLWYKVTRDGGP 
TATAATTGCATCACGACITATGATTTATCTGTGGTATAAAGTTACTCGAGATGGTGGTCC 123 60 

AIYGHQRVGRHGKLFPCYKF 
GGCTATTTATGGTCACCAGCGAGTAGGTCGGCATGGAAAACTTTTTCCATGCTACAAATT 12420 

R S M V M N S 
TCGTTCTATGGTTATGAATTC 12441 
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GAATTCGGGAGGCGCAATGAAAGTCAGCTTTTTTCTGCTGAAATTTCCACTCTCATCGGA 6 0 

AACCTTTGTGCTGAATCAGATTACTGCGTTTATTGATATGGGCCATGAGGTGGAGATTGT 120 

CGCGTTACAAAAAGGCGATACCCAACATACTCACGCCGCCTGGGAGAAGTATGGCCTGGC 180 

GGCGAAAACCCGCTGGTTAC AGGATGAGCCCCAGGGACGGCTGGCGAAACTGCGCTACCG 240 

GGCATGTAAAACGCTGCCGGGGCTGCATCGGGCGGCGACCTGGAAAGCGCTCAATTTTAC 300 

CCGCTATGGCGATGAATCACGCAATTTGATCCTTTCCGCGATTTGCGCGC AGGTGAGCCA 360 

GCCTTTTGTGGCGGATGTGTTTATCGCACACTTTGGTCCGGCGGGCGTGACGGCGGCCAA 42 0 

ACTACGCGAACTGGGCGTGCTTCGCGGCAAAATCGCGACTATTTTCCACGGGATTGATAT 480 

CTCTAGTCGTGAGGTGCTCAGTCATTACACGCCGGAGTATCAGCAGTTGTTTCGTCGTGG 540 

CGATCTGATGCTGCCCATCAGCGATCTGTGGGCCGGTCGCCTGAAAAGTATGGGCTGTCC 600 

GCCGGAAAAGATTCCCGTTTCGCGCATGGGCGTCGACATGACGCGTTTTACCCATCGTTC 660 

GGTGAAAGCGCCAGGGATGCCGCTGGAGATGATTTCCGTCGCGCGCCTGACAGAAAAAAA 720 

AGGCCTGCATGTGGCGATTGAAGCCTGTCGGCAACTGAAAGCACAGGGCGTGGCGTTTCG 780 

CTACCGCATTCTGGGGATTGGCCCGTGGGAACGTCGGCTGCGCACGCTCATCGAGCAGTA 840 

TCAGCTAGAGGATGTCATTGAGATGCCGGGGTTTAAACCGAGCCATGAAGTGAAGGCGAT 900 

GCTGGATGACGCCGATGTTTTTTTGCTGCCGTCGATTACCGGTACGGATGGCGATATGGA 960 

AGGTATTCCGGTAGCGCTGATGGAGGCGATGGCGGTAGGGATTCCCGTGGTATCTACCGT 1020 

GCATAGCGGTATTCCGGAACTGGTGGAGGCCGGCAAATCCGGCTGGCTGGTGCCGGAAAA 1080 

CGATGCGCAGGCGCTGGCGGCCCGACTCGCTGAGTTCAGCCGGATTGACCACGACACGCT 1140 

GGAGTCGGTGATCACGCGCGCCCGTGAAAAAGTGGCGCAAGATTTTAATCAGCAGGCGAT 1200 

TAATCGCCAGTTAGCCAGCCTGCTACAAACGATATAAACGAGGTGGTATGCCCGCGACTA 1260 

AATTCTCCCGACGTACCCTCCTGACGGCAGGTTCTGCGCTTGCTGTTCTTCCTTTTCTGC 1320 

GCGCCTTGCCGGTACAGGCGCGTGAACCTCGCGAGACCGTCGATATTAAGGATTATCCGG 1380 

CGGATGACGGTATCGCCTCGTTCAAACAGGCCTTCGCCGACGGACAGACCGTGGTCGTAC 1440 

CGCCAGGATGGGTGTGTGAAAATATCAATGCGGCGATAACGATTCCGGCGGGAAAAACGC 1500 

TGCGGGTACAGGGCGCGGTGCGTGGGAATGGCCGGGGACGGTTTATTTTGCAGGACGGGT 1560 

GTCAGGTGGTGGGGGAGCAGGGCGGCAGTCTGCACAATGTGACGCTGGATGTTCGCGGGT 1620 

CGGACTGTGTGATTAAAGGCGTGGCGATGAGCGGCTTTGGCCCCGTCGCGCAAATTTTC A 1680 

TCGGTGGTAAGGAACCGCAGGTGATGCGTAATCTCATTATCGATGACATCACCGTTACCC 1740 

ACGCCAACTACGCCATTCTCCGCCAGGGATTTCATAACCAAATGGATGGCGCGCGGATTA 1800 

CGCATAGCCGCTTTAGCGATTTACAGGGGGACGCCATTGAGTGGAATGTCGCGATTCACG 1860 

ACCGCGACATCCTGATTTCCGATCATGTCATCGAACGCATTAATTGTACCAATGGCAAAA 1920 

TCAACTGGGGGATCGGCATCGGGCTGGCGGGTAGCACCTATGACAACAGTTATCCTGAAG 1980 
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ACCAGGCAGTAAAAAACTTTGTGGTGGCC AATATTACCGGATCTGATTGCCGAC AGCTTG 2 040 

TGCACGTAGAAAATGGCAAACATTTCGTCATTCGCAATGTCAAAGCCAAAAACATCACGC 2100 

CCGGTTTCAGTAAAAATGCGGGTATTGATAACGCAACGATCGCAATTTATGGCTGTGATA 2160 

ATTTCGTCATTGATAATATTGATATGACGAATAGTGCCGGGATGCTCATCGGCTATGGCG 2220 

TCGTTAAAGGAAAATACCTGTCAATTCCGCAAAACTTTAAATTAAACGCTATTCGGTTGG 2280 

ATAATCGCCAGGTTGCTTATAAATTACGCGGCATTCAAATTTCCTCCGGCAAC ACCCCCT 2340 

CTTTTGTCGCCATCACCAATGTACGGATGACGCGTGCTACGCTGGAACTGCATAATCAAC 2 400 

CGCAGCACCTCTTTCTGCGCAATATCAACGTGATGCAAACTTCAGCGATTGGCCCGGCGT 2460 

TAAAAATGCATTTCGATTTGCGTAAAGATGTACGTGGTCAATTTATGGCCCGCCAGGACA 2520 

CGCTGCTTTCCCTCGCTAATGTTC ATGCCATCAATGAAAACGGGCAGAGTTCCGTGGATA 2580 

TCGAC AGGATTAATC ACC AAACCGTGAATGTCGAAGC AGTGAATTTTTCGCTGCCGAAGC 2640 

GGGGAGGGTAAGTACCGCTATTTTTACGAAAATTCCTGGGAAAAAGTTGTTCATACTTAA 2700 

TGTTATGGTGCCGACTAAGACGTAATGTAGAGCGTGCCATCATTATCCCTGGCAGCAGAG 2760 

TAATTCATGCTGGCGAAAACAAGCTAAAGAGCTATAATTCAGCAACCATTTTACAGGTGG 2820 

AAGAAACAATGATGAATTTGAAAGCAGTTATACCGGTAGCGGGTTTGGGTATGC ATATGT 2880 

TGCCTGCCACCAAGGCAATCCCAAAAGAGATGCTACCGATCGTCGACAAGCCAATGATTC 2940 

AGTACATTGTCGATGAGATTGTGGCTGCAGGGATCAAAGAAATCGTGCTGGTGACTCACG 3000 

CGTCTAAAAACGCCGTTGAGAACCACTTCGACACCTCTTATGAACTTGAATCACTTCTTG 3060 

AGCAGCGCGTTAAGCGTCAGCTTTTGGCGGAAGTGC AATCTATCTGCCCACCGGGCGTGA 3120 

CGATTATGAACGTTCGCCAGGCGCAGCCGTTAGGGCTGGGGCATTCTATTCTGTGCGCGC 3180 

CTACCGCCGATCCGCTGCGCTATAACCTTGCGGCGATGGTGGCGCGTTTCAATGAAACGG 3300 

GTCGCAGCCAGGTGCTGGCGAAGCGCATGAAAGGTGATTTATCGGAGTATTCCGTTATCC 3360 

AGACGAAAGAACCTCTGGATAATGAAGGCAAAGTCAGCCGGATTGTGGAGTTTATCGAAA 3420 

AACCGGATCAGCCGCAGACGCTGGATTCCGATTTGATGGCGGTAGGCCGTTATGTGCTTT 3480 

CAGCCGACATCTGGGCGGAACTGGAAAGAACCGAACCGGGCGCCTGGGGCCGCATCCAGC 3 540 

TCACCGATGCCATTGCTGAACTGGCGAAAAAACAGTCGGTTGACGCGATGCTAATGACGG 3600 

GTGACAGCTATGACTGCGGTAAAAAAATGGGCTACATGCAGGCATTTGTGAAGTACGGGC 3 660 

TGCGCAACCTGAAAGAAGGAGCCAAGTTCCGTAAGAGCATAGAGCAGCTTTTGCATGAAT 3720 

AAGTATTAACAACCGTGATAAATGGTTGGTGATAAACATAATAACGGCAGTGAACATTCG 3780 

AAGCGGC AAGTTGGCTGAAACGAGTGTTGACTGCCGTTTTAGTTTTGTATAAAGGGCTTA 3 840 

AGTAACAAGGGGTTATCTGGAGCATTTTAATGCTGATTTTATAAGATTAATCCTTGTTTC 3 900 

CGGATGCAATTAATAAGACAATTAGCGTTTAAGTTTTAGTGAGCTTTGCCCTGCTGGGCG 3960 
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AGGTTTGCAACAAGTCGATATGTACGCAGTGCACTGGTAGCTGATGAGCCAGGGGCGGTA 4020 
GCGTGTGTAACGACTTGAGCAATTAATTTTTATTGGCAAATTAAATACCACATTAAATAC 4080 

Start of xmlB 

VKILITGGAGFIGS 
GCCTTATGGAATAGAAAAGTG AAGATACTTATTACTGGCGGGGCAGGTTTTATTGGATC A 4140 

AVVRHI IKNTQDTVVNIDKL 
GCTGTTGTCCGCCATATTATTAAGAATACACAGGACACTGTAGTTAATATTGATAAATTA 4200 

TYAGNLESLSDI SESNRYNF 
ACCTACGCCGGTAATCTTGAATCCCTTTCTCATATTTCTGAAAGTAATCGCTACAATTTT 4260 

EHADICDSAEITRI FEQYQP 
GAACACGCGGATATTTGTGATTCCGCTGAAATAACGCGTATTTTTGAGCAGTACCAGCCG 4320 

DAVMHLAAESHVDRSITGPA 
GACGCGGTGATGCATTTGGCTGCGGAAAGTCATGTGGACCGTTCGATTACCGGGCCAGCA 43 80 

AFIETNIVGTYAL LEVARKY 
GCATTTATTGAAACCAATATCGTCGGCACCTATGCACTTCTTGAAGTTGCGCGTAAATAC 4440 

WSALGEDKKKNFRFHHISTD 
TGGTCTGCCCTTGGCGAAGATAAAAAAAATAATTTTCGTTTTCATCATATTTCCACTGAT 4500 

EVYGDLPHPDEVENSVTLPL 
GAAGTTTACGGCGATTTACCGCATCCTGATGAAGTTGAAAACAGCGTTACGCTGCCGTTA 4560 

JL™ E TTAYAPS SPYSASKASS 
TTTACTGAAACGACGGCATATGCGCCAAGTAGCCCCTATTCTGCGTCAAAAGCATCCAGC 4620 

DHLVRAWRRTYGLPT IVTNC 
GATCATTTAGTCCGTGCCTGGCGGCGTACCTATGGTCTACCAACGATCGTTACCAATTGT 468 0 

SNNYGPYHFPEKLIPLVILN 
TCTAATAACTATGGCCCTTATCACTTCCCTGAAAAACTGATTCCGTTGGTCATTTTGAAC 474 0 

ALEGKPLPIYGKGDQIRDWL 
GCACTGGAAGGAAAGCCTTTGCCAATTTATGGCAAAGGGGATCAGATTCGCGATTGGCTA 480 0 

YVEDHARALHMVVTEGKAGE 
TATGTAGAAGATCATGCTCGCGCGCTTCATATGGTAGTGACTGAAGGCAAGGCAGGGGAG 4860 

TYNIGGHNEKKNLDVVFTIC 
ACTTATAACATTGGTGGACACAATGAGAAGAAAAATCTCGATGTGGTATTTACCATCTGT 4920 

DLLDEIVPKATSYREQITYV 
GATCTGCTGGATGAGATTGTACCCAAAGCGACTTCTTATCGTGAACAAATCACTTATGTC 4980 

A DRPGHDRRYAIDAGKISRE 
GCGGATCGTCCGGGCCATGATCGTCGTTATGCCATTGATGCAGGTAAAATTAGCCGCGAA 5040 



TTAGGCTGGAAACCGCTGGAGACCTTTGAAAGCGGTATTCGTAAAACAGTGGAATGGTAC 

L A NTQWVNNVKSGAYQSWIE 
CTTGCAAATACTCAATGGGTAAACAATGTTAAAAGTGGGGCGTATCAGAGTTGGATAGAA 



End of rmlB 



Start of nalD 

E G R Q * 



5100 
5160 



CAGAACTATGAAGGACGCCAGTAaTGAATATCTTACTTTTTGGTAAGACAGGGCAAGTAG 5220 
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GWELQRS LA PVGNL I ALDVH 
GCTGGGAGTTGCAACGTTCTCTGGCACCGGTA6GGAATCTGATTGCCCTGGATGTCCATT 



SKEFCGDFSNPKGVAETVRK 

C AAAAGAGTTTTGCGGTGATTTTAGTAATCCGAAAGGCGTTGCCGAAACCGTTCGTAAGC 5340 

LRPDVIVNAAAHTAVDKAES 
TTCGTCCCGATGTGATTGTTAACGCAGCAGCCCATACTGCAGTAGATAAAGCAG AGTCTG 5400 

EPELAQLLNATSVEAIAKAA 
AACCAGAACTGGCGCAGTTACTTAACGCCACCAGTGTGGAAGCCATCGCTAAAGCAGCCA 5460 

NETGAWVVHYSTDYVF PGTG 
ACGAAACTGGCGCATGGGTAGTGCATTATTCAACCGATTATGTATTTCCTGGTACCGGCG 5520 

DIPWQETDATSPLNVYGKTK 
ATATCCCATGGCAGGAAACGGACGCTACGTCGCCGCTGAATGTCTATGGC AAAACCAAAC 5580 

LAGEKALQDNCPKHLIFRTS 
TGGCGGGAGAAAAGGCCCTGC AGGATAACTGCCCTAAACACCTTATCTTCCGCACCAGTT 5640 

WVYAGKGNNFAKTMLRLAKE 
GGGTTTATGCAGGTAAGGGCAATAATTTCGCAAAGACAATGCTTCGTCTGGCGAAAGAGC 5700 

RQTLSVINDQYGAPTGAEL L 
GTCAGACACTTTCAGTCATTAACGATCAGTACGGTGCGCCAACCGGTGCGGAATTACTGG 5760 

ADCTAHAIRVALNKPEVAGL 
CTGACTGTACGGCGCATGCGATCCGTGTGGCGTTAAATAAACCAGAAGTCGCAGGTCTTT 5820 

YHLVAGGTTTWHDYAALVFD 
ACCATCTGGTTGCCGGGGGAACCACAACCTGGCATGACTACGCGGCCTTAGTCTTTGACG 5880 

EARKAGITLALTELNAVPTS 
AGGCGCGCAAAGCAGGGATAACGCTTGCGCTGACTGAGCTTAATGCTGTGCCGACCAGCG 5940 

AYPTPASRPGNSRLNTEKFQ 
CCTACCCGACGCCGGCGAGCAGACCAGGCAATTCGCGTCTCAATACTGAAAAGTTTCAGC 6000 

RNFDLILPQWELGVKRMLTE 
GTAATTTTG ACCTTATTCTGCCTCAATGGGAATTAGGAGTTAAGCGTATGCTGACTGAAA 6060 

End of rmlV 

MFTTTTI * 

TGTTTACGACGACAACCATC TAATAAATTTAAATGCCCATCAGGGCATTTTCTATGAATG 6120 

Start of xmlA 

MKTRKGI ILAGGSGTRL 
AGAAATGGAAA^AAAACGOSTAAGGGCATTATTTTAGCGGGGGGCTCCGGCACCCGTCT 6180 

YPVTMAVSKQL LP IYDKPM I 
TTATCCGGTGACCATGGCGGTAAGTAAGCAATTGCTACCAATTTATGATAAACCGATGAT 6240 

YYPLSTLMLAGIRDILI 1ST 
TTACTATCCCCTTTCCACGCTTATGCTGGC AGGCATTCGGG ATATCCTG ATCATC AGT AC 6300 

PQDTPRFQQLLGDGSQWGLN 
GCCACAGGACACGCCGCGTTTTC AACAACTGCTGGGAGACGGCAGCCAGTGGGGGCTGAA 6360 

LQYKVQPSPDGLAQAFI IGE 
TCTTCAATATAAAGTACAGCCAAGTOC^AT^ 6420 

EFIGHDDCALVLGDNIFYGH 
AGAGTTCATTGGTCATGATGATTGTGCATTAGTGCTGGGTGAC AATATCTTCTATGGTCA 6480 
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DLPKLMEAAVKKESGATVFA 
TGATTTACCAAAGTTAATGGAAGCTGCCGTTAATAAAGAAAGTGGTGCTACCGTCTTCGC 

YHVNDP ERYGVVEFDQKGTA 
TTATCATGTAAACGATCCGGAGCGCTACGGTGTGGTTGAGTTTGACCAAAAGGGCACAGC 

VSLEEKPLQPKSNYAVTGLY 
CGTTAGTCTGGAAGAAAAACCATTACAACCGAAGAGTAATTACGCGGTAACGGGGCTGTA 

FYDNSVVE*MAKNLKPSARGE 
TTTTTATGATAATAGCGTGGTGGAGATGGCGAAAAATCTTAAGCCTTCCGCTCGCGGTGA 

LEITDINRIYMEQGRLSVAM 
GTTAGAAATCACGGATATTAACCGTATCTATATGGAGCAGGGAAGATTGTCTGTCGCTAT 

MGRGYAWLDTGTHQSLIEAS 
GATGGGGCGCGGTTATGCCTGGCTGGATACAGGGACGCATCAGAGTTTGATAGAGGCCAG 

NFIATI E ERQGLKVSCPEEI 
TAATTTTATTGCAACCATCGAAGAACGCCAGGGGCTAAAAGTGTCCTGCCCGGAAGAGAT 

~£*JL RKNFI NAQQVIELAGPLS 
CGCATTTCGTAAAAATTTTATAAATGCACAACAGGTTATAGAACTGGCCGGGCCATTATC 

« M ^ v _ v „ of "«iIA Start of XjnlC 

KNDYGKYL LKMVKGL *VMIV 
AAAAAATGATTATGGCAAATATTTGCTGAAGATGGTGAAAGGTTTArAAGTGATGATTGT 

* m „ K TAIPDVLILEPKVFGDER 
GATTAAAACAGCAATACCAGATGTCTTGATCTTAGAGCCTAAAGTTTTTGGCGATGAGAG 

GFFFESYNQQTFEELIGRKV 
GGGATTCTTTTTTGAAAGTTATAACCAGCAGACCTTTGAAGAGTTGATTGGACGTAAAGT 

TFVQDNHSKSKKNVLRGLHF 
TACATTTGTTCAAGATAATCATTCAAAATCCAAAAAGAACGTACTCAGAGGGCTACATTT 

QRGENAQGKLVRCAVGEVFD 
TCAGAGAGGAGAAAATGCACAGGGGAAGTTAGTTCGTTGTGCTGTCGGTGAGGTTTTTGA 

VAVDIRKESPTFGQWVGVNL 
TGTTGCGGTCGATATCCGAAAAGAATCGCCTACTTTTGGTCAATGGGTTGGTGTAAATCT 

™m~~ ENKRQLWIPEGFAHGFVT 
GTCTGCTGAGAATAAGCGACAGCTTTGGATTCCAGAAGGTTTTGCTCATGGTTTTGTTAC 

LSEYAEFLYKATNYYSPSSE 
TCTTAGTGAGTATGCAGAGTTTCTGTACAAAGCAACTAATTATTACTCACCTTCATCGGA 



AGGTAGCATTCTATGGAATGATGAGGCAATAGGTATTGAATGGCCTT' 



"TTTCTCAGCTGCC 



End of zmlC 



TGAGCTTTCAGCAAAAGATGCTGCAGCACCTTTACTGGATCAAGCCTTGTTAACAGAG 1 



Start of ddhD 



AGCATCGJSTCTCATATTATTAAGATTTTTCCATCAAATATTGAATTTTCCGGTAGAGAG 

DESILDAALSAGIHLEHSCK 
GATGAATCAATCCTCGATGCTGCGCTATCGGCTGGTATCCATCTTGAACATAGCTGCAAA 

AGDCGICESDLLAGEVVDSK 
GCGGGTGATTGTGGTATCTGTGAGTCCGATTTGTTGGCGGGAGAAGTTGTTGACTCCAAA 
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GNIFGQGDKILTC CCK PKTA 
GGTAATATTTTTGGACAGGGTGATAAAATACTAACCTGCTGCTGTAAACCTAAAACCGCC 7 800 

LELNAHF FPELAGQT K KIVP 
CTTGAGCTAAATGCGCATTTTTTTCCTGAACTAGCTGGAC AGAC AAAAAAAATTGTCCCA 7 860 

CKVNSAVLVSG DVMTLKLRT 
TGCAAGGTAAATAGTGCTGTACTGGTTTC AGGCGATGTTATGACTTTGAAGTTACGCACA 7920 

PPTAKIGFLPGQYINLHYKG 
CC^CCAACAGCAAAAATTGGCTTCCTTCCAGGGCAGTATATCAATTTACATTATAAAGGT 7980 

VTRSYSIANSDESNGI ELHV 
GTAACTCGCAGTTATTCTATCGCTAATAGTGATGAGTCGAATGGTATTGAGTTGCATGTA 8040 

RNVPNGQMS SLIFGELQENT 
AGGAATGTTCCCAATGGTCAGATGAGTTCGCTCATTTTTGGGGAGTTACAAGAAAATACT 8100 

LMRIEGPCGTF FIRESDRPI 
CTTATGCGC ATTGAAGGGCCTTGCGGAAC ATTTTTTATTCGTGAAAGTGACAGACCTATA 8160 

IFLAGGTGFAPVKSMVEHLI 
ATCTTCCTTGCAGGCGGTACTGGATTCGCTCCAGTTAAATCAATGGTTGAGCATCTCATT 8220 

QGKCRREIYIYWGMQYSKDF 
CAGGGAAAATGTCGTCGTGAGATCTACATTTACTGGGGAATGCAATATAGTAAAGATTTT 8280 

YSALPQ QWS EQHDNVHYI PV 
TACTCTGCATTACCGCAGCAGTGGAGTGAACAGCACGACAACGTTCATTATATCCCTGTT 8340 

VSGDDAEWGGRKGFVHHAVM 
GTTTCTGGTGATGACGCCGAATGGGGGGGAAGAAAGGGATTTGTCCATC ATGCCGTGATG 8400 

DDFDSLEFFDIYACGSPVMI 
GATGATTTTGATTCTCTAGAGTTCTTCGATATATATGCATGTGGTTCACCTGTGATGATC 8460 

DASKKDFMMKNLSVEHFYSD 
GATGCCAGTAAAAAGGACTTTATGATGAAAAATCTCTCTGTAGAACATTTCTATTCTGAT 8520 

End of <3dhC Start of ddfaA 

AFTASNNIEDNL* 

MKAVILAG 
GCATTTACCGCATCTAATAATATTGAGGATAATTT MSAAAGCGGTCATCCTGGCTGGTG 8580 

GLGTRLSEETIVKPKPMVEI 
GACTTGGTACCAGACTAAGTGAAGAAACAATTGTAAAACCAAAACCGATGGTAGAAATTG 8640 

GGKPILWHIMKMYSVHGIKD 
GTGGCAAGCCTATTCTTTGGCACATTATGAAAATGTATTCTGTGCATGGTATCAAGGATT 8700 

PI ICCGYKGYVIKEYFANYF 
TTATTATCTGCTGTGGTTATAAAGGATATGTGATTAAAGAATATTTTGCGAACTACTTCC 8760 

L«? MS DVT FHMAENRMEVH HK 
TTCACATGTCAGATGTAACATTCCATATGGCTGAAAACCGTATGGAAGTTCACCATAAAC 8820 

RVEPWNVTLVDTGDS SMTGG 
GTGTTGAACCATGGAATGTCACATTGGTTGATACGGGTGATTCTTCAATGACTGGTGGTC 8880 

RLKRVAEYVKDDEAFLFTYG 
GTCTCAAACGTGTTGCTGAATACGTAAAAGATGACGAGGCTTTCCTGTTTACTTATGGTG 8940 

**Ji~? ADLDIKATIDFHKAHGK 
ATGGCGTTGCCGACCTTGATATCAAAGCGACTATCGATTTCCATAAGGCTCACGGTAAGA 9000 
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KATLTATFP PGRFGALDIRA 
AAGCGACTTTAACAGCTACTTTTCC ACCAGGACGCTTTGGCGC ATTAGATATCCGAGCTG 9060 

GQVRSFQEKPKGDGAMING G 
GTCAGGTCCGGTCATTCCAGGAAAAACCGAAAGGCGATGGGGCAATGATCAATGGTGGTT 9120 

FFVL.NPSVIDL IDNDAT TWE 
TCTTTGTGTTGAATCCATCGGTTATCGATCTCATCGATAACGATGC AACAACCTGGGAAC 9180 

QEPLMTLAQQGELMAFEHPG 
AAGAGCCATTAATGACATTGGCACAACAGGGGGAGTTAATGGCTTTTGAAC ACCCAGGTT 9240 

FWQPMDTLRDKVYLEGLWEK 
TCTGGCAGCCGATGGATACCCTACGTGATAAAGTTTACCTCGAAGGGCTGTGGGAAAAAG 9300 

End of ddhA Start of ddhB 

MIDKNFWQG 

GKAPWK.TWE* 

GTAAAGCTCCGTGGAAAACCTGGGAG TAACTAGATCATTGATAAAAATTTTTGGCAAGGT 9360 

KRVFVTGHTGFKGSWLSLWL 
AAACGTGTATTCGTTACCGGCCATACTGGCTTTAAAGGAAGCTGGCTTTCGCTATGGCTG 9420 

TEMGAIVKGYALDAPTVPSL 
ACTGAAATGGGTGCAATTGTAAAAGGCTATGCACTTGATGCGCCAACTGTTCCAAGTTTA 9480 

FEIVRLNDLMESHIGDIRDF 
TTTGAGATAGTGCGTCTTAATGATCTTATGGAATCTCATATTGGCGACATTCGTGATTTT 9540 

EKLRNSIAEFKPEIVFHMAA 
GAAAAGCTGCGCAATTCTATTGCAGAATTTAAGCCAGAAATTGTTTTCC ATATGGCAGCC 9600 

QPLVRLSYEQ PIETYSTNVM 
CAGCCTTTAGTGCGCCTATCTTATGAACAGCCAATCGAAACATACTCAACAAATGTTATG 9660 

GTVHLLETVKQVGNIKAVVN 
GGTACTGTCCATTTGCTTGAAAC AGTTAAGCAAGTAGGTAACATAAAGGCAGTCGTAAAT 972 0 

I T S DKCYDNR EWVWGYRENE 
ATCACCAGTGATAAGTGCTACGACAATCGTGAGTGGGTGTGGGGCTATCGTGAGAACGAA 9780 

PMGGYDPYSKSKGCAELVAS 
CCCATGGGAGGGTACGATCCATACTCTAATAGTAAAGGTTGTGCAGAATTAGTCGCGTCT 9840 

AFRNSFFNPANYEQHGVGLA 
GCATTCCGGAACTCATTCTTCAATCCTGCAAATTATGAGCAACATGGCGTTGGTTTGGCG 9900 

SVRAGNVIGGGDWAKDRLIP 
TCTGTGAGGGCTGGTAATGTCATAGGCGGAGGCGATTGGGCTAAAGACCGTTTAATTCCC 9960 

DILRSFEWNQQVI IRNPYSI 
GATATTCTGCGCTCATTTGAAAATAACCAGCAGGTTATTATTCGAAACCCATATTCTATC 10020 

CGTCCCTGGCAGCA ^ LEPLSGYIVVAQRL 

YTEGAKFSEGWNFGPRDEDA 
TATACAGAAGGTGCTAAGTTTTCTGAAGGATGGAATTTCGGCCCGCGTGATGAAGATGCG 10140 

KTVEFIVDKMVTLWGD DASW 
AAGACGGTCGAATTTATTGTTGACAAGATGGTCACGCTTTGGGGTGATGATGCAAGCTGG 10200 

LLDGENHPHEAHYLKLDCSK 
TTACTGGATGGTGAGAATCATCCTCATGAGGCACATTACCTGAAACTGGATTGCTCTAAA 10260 
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GCAAATATGCAATTAGGATGGCATCCGCGTTGGGGATTGACTGAAACACTTGGTCGCATC 

VKWHKAWIRGEDMLICSKRE 
GTAAAATGGCATAAAGCATGGATTCGCGGCGAAGATATGTTGATTTGTTCAAAGCGTGAA 

End of ddhJB 

ISDYMSATTR * 
ATCAGCGACTATATGTCTGCAACTACTCGTTAAGAAAATAAGTTTAAGGAATCAAAGTAA 
Start of ddhC 

M TANNLREQI SQLVAQYANE 
TGACAGCAAATAACCTGCGTGAGCAAATCTCTCAGCTTGTCGCTCAGTATGCGAATGAGG 

ALSPKPFVAGTSVVPPSGKV 
CATTGAGCCCGAAACCTTTTGTTGCAGGTACAAGCGTTGTGCCTCCTTCCGGGAAGGTTA 

IGAKELQLMVEASLDGWLT T 
TTGGTGCCAAAGAGTTACAATTGATGGTTGAGGCGTCTCTTGATGGATGGCTAACTACTG 

GRFNDAFEKKLGEFIGVPHV 
GTCGTTTCAATGATGCCTTTGAAAAAAAACTTGGGGAATTTATTGGGGTTCCTCATGTTT 

LTTTSGSSANLLALTALTSP 
TAACGACAACATCTGGCTCTTCGGCAAACTTGCTGGCACTGACTGCGCTGACTTCCCCAA 

KLGERALKPGDEVITVAAGF 
AATTAGGCGAGCGAGCTCTCAAACCTGGTGATGAGGTTATTACTGTCGCTGCTGGCTTCC 

PTTVN PAI Q N G L I PVFVDVD 
CGACTACAGTTAACCCGGCGATCCAGAATGGTTTAATACCGGTATTCGTGGATGTTGATA 

IPTYNIDASLIEAAVTEKSK 
TCCCGACATATAATATCGATGCCTCTCTCATTGAAGCTGCAGTTACTGAGAAATCAAAAG 

AIMIAHTLGNAFNLSEVRRI 
CGATAATGATCGCTCATACACTCGGTAATGCATTTAACCTGAGTGAAGTTCGTCGGATTG 

ADKYNLWLI EDC CDALGTTY 
CCGATAAATATAACTTATGGTTGATTGAAGACTGCTGTGATGCCCTTGGGACGACTTATG 

EGQMVGTFGD IGTVSFYPAH 
AAGGCCAGATGGTAGGTACCTTTGGTGACATCGGAACCGTTAGTTTTTATCCGGCTCACC 

HITMGEGGAVFTKSGELKKI 
ATATCACAATGGGTGAAGGCGGTGCTGTATTCACCAAGTCAGGTGAACTGAAGAAAATTA 

IESFRDWGRDCYCAPGCDNT 
TTGAGTCGTTCCGTGACTGGGGCCGGGATTGTTATTGTGCGCCAGGATGCGATAACACCT 

CGKRFGQQLG SL PQGYDHKY 
GCGGTAAACGTTTTGGTCAGCAATTGGGATCACTTCCTCAAGGCTATGATCACAAATATA 

TYSHLGYNLKITDMQAACGL 
CTTATTCCCACCTCGGATATAATCTCAAAATCACGGACATGCAGGCAGCATGTGGTCTGG 

AQLERVEEFVEQRKANFSYL 
CTCAGTTGGAGCGCGTAGAAGAGTTTGTAGAGCAGCG'PAAAGCTAACTTTTCCTATCTGA 

KQGLQSCTEFLELPEATEKS 
AACAGGGCTTGCAATCTTGCACTGAATTCCTCGAATTACCAGAAGCAACAGAGAAATCAG 

?mJ« SWFGFPIT L.KETSGVNRV 
ATCCATCCTGGTTTGGCTTCCCTATCACCCTGAAAGAAACTAGCGGTGTTAACCGTGTCG 
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AACTGGTGAAATTCCTTGATGAAGCAAAAATCGGTACACGTTTACTGTTTGCTGGAAATC 

LIRQPYFANVKYRVVGELTN 
TGATTCGCCAACCGTATTTTGCTAATGTGAAATATCGTGTAGTGGGTGAGTTGACAAATA 

*° RIM NQTFWIGIYPGLTTE 
CCGACCGTATAATGAATCAAACGTTCTGGATTGGTATTTATCCAGGCTTGACTACAGAGC 

hldyvvskfeeffglTf*^ 
atttagattatgtagttagcaagtttgaagagttctttggtttgaatttc taattcaatt 

Start: of aba 

MTFLKEYVIVSGA 
TATTCTATCTGGTGATTGCGA^ACCTTTTTGAAAGAATATGTAATTGTCAGTGGGGCTT 

fL„ G FIGKHL LEALKKSGISVV 
CCGGCTTTATTGGTAAGCATTTACTCGAAGCGCTAAAAAAATCGGGGATTTCAGTTGTCG 

o*. 1 ^ TRDVIK NNSNALANVRWC 
CAATCACTCGAGATGTAATAAAAAATAATAGTAATGCATTAGCTAATGTTAGATGGTGCA 

SWDNI EL LV EELS I DSALIG 
GTTGGGATAATATCGAATTA-JTAGTCGAGGAGTTATCAATTGATTCTGCATTAATTGGTA 



L A 



E Y 



K 



N 



TCATTCATTTGGCAACAGAATATGGGCATAAAACATCATCTCTCATAAATATTGAAGATG 

ANVIKPLKLL DLAIKYRADI 
CAAATGTTATAAAACCATTAAAGCTTCTTGATTTGGCAATAAAATATCGGGCGGATATCT 



N 



TTTTAAATACAGMAOT^^ x 2 x 8 0 

Y I ITKRHFDEIGHY YANMHD 
ATATAATTACTAAAAGACACTTTGATGAAATTGGGCATTATTATGCTAATATGCATGACA 12240 

ISFVNMRLEHVYGPGDGENK 
TTTCATTTGTAAACATGCGATTAGAGCATGTATATGGGCCTGGGGATGGTGAAAATAAAT 12300 

FIPYI IDCLNKKQSCVKCTT 
TTATTCCATACATTATCGACTGCTTAAATAAAAAACAGAGTTGCGTGAAATGTACAACAG 12360 

GEQIRDFIFVDDVVNAYLTI 
GCGAACAGATAAGAGACT1TATTTTTGTAGATGATGTGGTAAATGCTTATTTAACTATAT 12420 

LENRKEVPSYTEYQVGTGAG 
TAGAAAATAGAAAAGAAGTACCTTCATATACTGAGTATCAAGTTGGAACTGGTGCTGGGG 12480 

VSLKDFLVYLQNTMMPGS SS 
TAAGTTTGAAAGATTTTCTGGTTTATTTGCAAAATACTATGATGCCAGGTTCATCGAGTA 12540 



E Q 



D N E 



TATTTGAATTTGGTGCGATAGAGCAAAGAGATAATGAAATAATGTTCTCTGTAGCAAATA 

NKNLKAMGWKPNFDYKKGIE 
ATAAAAATTTAAAAGCAATGGGCTGGAAACCAAATTTCGATTATAAAAAAGGAATTGAAG 

End of aba 

E L I. K R L * 

AACTACTGAAACGGTTATOAGATTTTCATGATCTTTTAATAAATAA 

Start of wzx 

AGTCGCGTTATGTTGTAAAAACTAAGTCGTTTAATTGCATAGTSAAAr^ Q 



iGTTCAATTGTTAA 
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AAATTCCGAGTCATTTAATTGTTGCAGGTTCATCATGGTTATCCAAAATAATAATTGCCG 12 840 

GVQLASI SYLISMLGE EKYA 
GGGTGCAGTTAGCAAGTATTTCATATCTTATTTCTATGCTAGGTGAAGAGAAATATGCAA 12900 

I FSLIiTGLLVWCSAVDFGIG" 
TCTTTAGTTTGTTAACTGGTTTATTAGTATGGTGTAGCGCTGTTGATTTTGGCATAGGTA 12960 

TGLQNYI SECRAKNKSYDAY 
CAGGACTGCAAAATTATATATGAGAATGCAGAGCCAAAAACAAAAGTTATGATGCATATA 13 020 

IKSALHDSFIAI IFFIALFY 
TTAAATCAGCATTACATCTAAGCTTTATAGCTATTATTTTTTTTATTGCTTTATTTTAT A 13080 

IFSGVISAKYLS SFHEVLQD 
TTTTTTCTGGGGTAATTTCCGCTAAATATCTTTCTTCTTTTCATGAGGTATTACAGGACA 13140 

KTRMLFFTSCLVFSSIGIGA 
AAACCAGAATGCTCTTTTTTACCTCATGTCTGGTTTTCAGTTCTATTGGAATCGGAGCTA 13200 

IAYKILFAELVGWKAKL LNA 
TTGCTTATAAAATACTTTTTGCCGAATTGGTCGGGTGGAAAGCTAATCTATTAAACGCAT 13260 

LSYMIGMLGLLYIYYRGISV 
TATCTTATATGATAGGTATGCTCGGCTTGCTATATATATACTATAGGGGGATCTCAGTTG 13 320 

DIKLSLIVLYLPVGMISLCY 
ACATAAAATTATCACTAATAGTCCTGTATCTTCCAGTGGGTATGATTTCATTGTGCTATA 133 80 

IVYRYIKLYHVKTTKSHYIA 
TTGTATATAGATACATAAAGCTTTATCATGTTAAAACAACAAAATCTCATTATATAGCAA 13440 

13500 

T DYMVI S QRLT PADIVQ YTV 
CAGATTATATGGTCATTTCTCAAAGGCTAACTCCTGCTGATATTGTTCAATATACAGTAA 13 560 

TMKIFGLVF F IYTAI LQALW 
CGATGAAAATTTTTGGTTTAGTCTTTTTTATTTATACTGCTATTTTGCAAGCATTATGGC 13620 

P ICAELRVKQ QWKKLNKMIG 
CTATA0K3TGCTGAATTGAGAGTCAAACAGCAATGGAAAAAACTTAACAAAATGATAGGTG 13680 

VNILliGSLYVVGCTIFIYLF 
TCAATATTTTGCTTGGCTCACTATATGTTGTTGGATGTACAATATTTATTTATTTATTTA 13740 

KEQIFSVIAKDINYQVSILS 
AAGAACAGATATTTTCAGTAATAGCCAAAGATATTAATTATCAAGTTTCTATTTTATCTT 13800 

FMLIGIYFCIRVWCDTYAML 
TTATGTTAATTGGCATATATTTCTGTATTCGCGTTTGGTGTGACACTTATGCAATGTTAT 13 860 

LQSMNYLKILWILVPLQAI I 
TGCAAAGTATGAATTATTTAAAAATACTTTGGATATTAGTACCACTACAAGCAATAATTG 13920 

GGIAQWYFSSTLGISGVLLG 
GTGGAATAGCACAATGGTATTTTTCTAGTACGCTTGGAATCAGTGGAGTGCTGCTTGGCT 13980 

LIISFALTVFWGLPLTYLIK 
TGATTATATCTTTTGCTTTAACTGTTTTTTGGGGGCTTCCACTAACTTACTTAATTAAGG 14040 
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End of wzx Start of wbaV 

A N K -G * MLI-SFCIPTYNRKQ 
CAAATAAGGGArAATCATMSCTTATATCATTTTGTATTCCAACTTATAATAGAAAACAA 

YLEELLNSINNQEKFNLDIE 
TATCTTGAAGAGTTGTTGAATAGTATAAATAATCAGGAAAAATTTAATTTAGATATTGAG 



S T D G 



I D V W- 



ATATGTATATCAGATAATGCCTCTACTGATGGTACAGAGGAAATGATTGATGTTTGGAGG 

NNYNFPI IYRRNSVNLGPDR 
AACAATTATAATTTCCCAATAATATATCGGCGTAATAGCGTTAACCTTGGGCCAGATAGG 



AATTTTCTTGCTTCAGTATCCCTTGCGAATGGGGATTATTGTTGGATATTTGGCAGTGAT 

DALAKDSLAI LQTYLDSQAD 
GATGCTCTTGCGAAAGACTCGTTAGCGATATTACAAACTTATCTCGATTCTCAAGCAGAT 

IYLCDRKETGCDLVEIRNPH 
ATATATTTATGTGACAGAAAAGAGACCGGGTGTGATTTAGTTGAGAITAGAAACCCTCAT 



CGTTCTTGGCTCAGAACAGATGATGAACTTTATGTGTTTAATAATAATTTAGATAGGGAA 



ATCTATCTCAGTAGATGCTTATCTATTGGTGGTGTArTTAGCTATCTAAGTTCTTTAATA 



GTAAAAAAAGAACGATGGGATGCCATTGATTTTGATGCGTCCTATATTGGCACTTCCTAT 

PHVFIMMSVFNTPGCLLHYI 
CCTCATGTATTTATCATGATGAGCGTATTTAATACGCCAGGGTGCCTTTTGCATTATATA 



TCAAAACCACTCGTAATATGCCGAGGAGATAATGATAGTTTCGAGAAGAAAGGAAAGGCC 

RRILIDFIAYLKLANDFYSK 
AGACGAATTTTAATTGATTTTATTGCATATTTAAAATTAGCTAATGATTTTTACAGTAAA 



AATATATCTTTAAAACGAGCATTTGAAAATGTTTTGCTAAAAGAGAGACCATGGTTATAT 

TTLAMACYGNSDEKRDLSEF 
ACAACTTTCGCTATGGCATGTTATGGCAATAGTGATGAAAAAAGAGATTTATCTGAATTT 



TATGCAAAGCTAGGTTGTAATAAAAATATGATCAACACTGTACTTCGATTTGGGAAACTA 
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Y A V K N 



End of wbaV 



GCATATGCAGTGAAAAATATTACCGTGCTTAAGAATTTTACTAAACGGATAATTAAG TAG 15060 
T AGTAAGTTATTATATTGAGATTAAATGTAGATTTAACCTTTCTGGATTCAGCTAGATTT 15120 



ACGTTACTGACTTTTCTTTTTAATGAAAATCATATTTGATATATATAAATAAATTTGGAT 



15180 
15240 
15300 
15360 



ATTGTTTTTGTAGTGTTTTACTGCCGGTATTACATTAACTCTATTATTAAGAATTACACC 
TAGTGTAi^GCTTCGTAATATTATTTATCCTTATGATTATTGCTTTAAAGATGCGTATGGA 

Start of wbaXT 

MIVNLSRLGKSGTG 
AAAACGGAGAGCTATTC A ATGATCGTAAACCTATC ACGTTTAGGTAAAAGTGGTACGGGA 15420 
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•MWQ-YS IKFL-TALR E I ADVDA 
ATGTGGCAATACTCGATTAAATTTTTAACGGCACTGCGAGAAATAGCTGATGTTGACGCA 15480 

IICSKVHADYFEKLGYAVVT 
ATAATCTGTAGCAAGGTACACGCTGATTATTTTGAAAAGCTCGGTTATGCAGTAGTTACT 15540 

VPNIVSNTSKTSRLRPLVW-Y 
GTTCCGAATATTGTTAGCAACACATCAAAAACATCGCGACTTAGACCATTAGTATGGTAT 15600 

VYSYWLALRVLIKFGNK KLV 
GTATATAGTTACTGGCTTGCGCTCAGGGTTTTAATTAAGTTTGGTAATAAAAAATTGGTG 15660 

CTTHHTIPLLRNQTITVHDI 
TGTACTACACATCACACTATCCCCTTACTGAGAAACCAAACGATAACCGTACATGATATA 15720 

RPFYYPDSFIQKVYFRFLLK 
AGACCTTTTTATTATCCAGATAGTTTTATTCAGAAAGTGTATTTTCGCTTTTTATTAAAA 15780 

MSVKRCKHVLTVSYTVKDSI 
ATGTCCGTTAAGCGATGTAAGCATGTTTTAACGGTATCTTATACCGTTAAAGATAGCATT 15840 

AKTYNVDSEKISVIYNSVNK 
GCTAAAACTTATAATGTAGATAGTGAGAAAATATCAGTAATTTATAATAGTGTTAATAAA * 1 5 9 0 0 

SDFIQKKEKENYFLAVGASW 
TCTGATTTTATACAAAAAAAAGAAAAAGAGAATTACTTTTTAGCTGTTGGTGCAAGTTGG 15960 

KNIHSF IKNK KVWSDSYN 
CCACATAflAAATATTCATTCATTCATAAAAAATAAAAAAGTTTGGTCTGACTCTTATAAT 16020 

LI I VCGRTDYAM S L Q QMVVD 
TTAATTATTGTATGTGGTCGTACTGACTATGCAATGTCTCTCCAACAAATGGTCGTTGAT 16080 

LELKDKVTFLHEVSFNELKI 
CTGGAACTAAAAGATAAAGTGACTTTTTTACATGAAGTCTCATTTAATGAATTAAAGATT 16140 

L Y SKAYALVYPS IDEGFGI P 
TTATATTCTAAAGCCTACGCGOTTGTTTATCCATCTATTGATGAGGGTTTTGGTATACCT 16200 

PI EAMASNTPVIVSDIPVFH 
CCTATTGAAGCGATGGCATCAAATACTCCAGTTATAGTGTCCGATATACCAGTATTTCAT 16260 

EVLTNGALYVNPD DEKSWOS 
GAAGTGTTAACCAATGGTGCATTATATGTGAATCCGGATGATGAAAAAAGCTGGCAGAGT 1632 0 

AI KNIEQLPDAI SRFN NYVA 
GCAATTAAAAATATAGAGCAGTTGCCTGATGCAATTTCCCGATTTAACAACTATGTCGCA 16380 

RYDFDNMKQMVGNWLA E^ °Z ^ 
CGGTATGACTTTGATAATATGAAGCAGATGGTTGGCAATTGGTTGGCGGAATCAAAA TAA 16440 

Start of whnN 

ITLI IPTYNAGSLWPNVL 
ATSAAAATAACATTAATTATTCCCACATATAATGCAGGGTCGCTTTGGCCTAATGTTCTG 16500 

DAIKQQTIYPDKLIVIDSGS 
GATGCGATTAAGCAGCAAACTATATATCCGGATAAATTGATTGTTATAGACTCAGGTTCT 16560 

KDETVPLASDLKNISIFNID 
AAAGATGAAACGGTTCCGTTAGCCTCAGACCTGAAAAATATATCAATATTTAATATTGAC 16620 

SKDFNHGGTRNLAVAKTLDA 
TCTAAAGATTTTAATCATGGAGGAACCAGAAATTTAGCAGTTGCAAAAACTCTGGACGCT 16680 
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DVI IFLTQDAI LADSDAIKN 
GATGTTATAATTTTTCTAACGCAAGATGCAATTCTCGCGGATTCGGATGC AATTAAAAAT 1674 0 

L.VYYFSDPLIAAVCGRQLPH 
TTGGTTTATTATTTTTC AGATCCATTGATAGCAGCGGTTTGTGGTAGAC AACTTCCTCAT 16800 

KDANPLAVHARNFNYS SKSI 
AAAGATGCTAATCCCCTTGCAGTGCATGCCAGAAATTTTAATTATAGTTCAAAATCTATT 16860 

VKSKADIEKLGIKTVFMSNS 
GTTAAAAGTAAGGCAGATATAGAAAAATTGGGTATTAAAACTGTATTTATGTCCAATTCT 16920 

FAAYRRSVFEELSGFPEHTI 
TTTGCTGCCTATCGCCGTTCCGTTTTTGAAGAGTTAAGTGGGTTTCCTGAACATACAATT 16980 

LAEDMFMAAKMIQAGYKVAY 
CTTGCCGAGGATATGTTTATGGCGGCTAAGATGATTCAGGCGGGTTATAAGGTCGCCTAC 17040 

CAEAVVRHSHNYTPRE EFQR 
TGCGCTGAAGCGGTGGTAAGACACTCCCATAATTATACCCCGCGAGAAGAGTTTCAACGA 17100 

YFDTGVFHACSPWIQRDFGG 
TATTTTGATACTGGTGTATTTCATGCTTGTTCTCCGTGGATTCAGCGTGACTTTGGCGGA 17160 

AGGEGFRFVKSEIQFL LKNA 
GCCGGTCGTGAGGGTTTCCGCTTCGTAAAATCAGAGATTCAATTCCTGCTTAAAAATGCA 17220 

PFWIPRAL LTTFAKFLGYKL 
CCGTTCTGGATTCCAAGAGCTTTATTAACAACCTTTGCTAAATTCTTGGGTTACAAATTA 17280 

GKHWQSLPLSTCRYFSMYK.S 
GGCAAGCATTGGCAATCTTTACCGTTGTCTACATGTCGCTATTTTAGCATGTACAAGAGT 17340 

End of wbaN Start of manC 

YWNNIQYSSSKEIK*MSFLP 
TATTGGAATAATATCCAATATTCTTCGTCAAAAG AGATAAAA TAAATGTCTTTTCTTCCC 17400 

VIMAG GTGSRLWPL SREYHP 
GTAATTATGGCTGGCGGC ACAGGTAGCCGTTTATGGCCGCTTTC ACGCGAATATCATCCG 17460 

KQFLSVEGKLSMLQNTIKRL 
AAGCAGTTTCTAAGCGTTGAAGGTAAACTATC AATGCTGCAAAATACTATAAAGCGATTA 17520 

ASLSTEEPVVICNDRHRFLV 
GCTTCACTTTCTACAGAAGAACCCGTTGTCATTTGCAATGACAGACACCGTTTCTTAGTC 17580 

AEQLREIDKLANNI ILEPVG 
GCTGAACAACTCCGTGAAATTGACAAGTTAGCAAATAATATTATTCTCGAACCGGTAGGC 17640 

RNTAPAIALAAFCALQNADN 
CGTAATACTGCACCAGCGATCGCTCTTGCCGCGTTTTGTGCGCTCCAGAATGCTGATAAT 17700 

ADPLLLVLAADHVIQDEIAF 
GCTGATCCTCTTTTGTTGGTTCTTGCTGCAGATC ATGTGATTCAGGATG AAATAGCTTTT 17760 

TKAVRHAEEYAANGKLVTFG 
ACGAAAGCTGTCAGACATGCTGAAGAATACGCTGC AAATGGTAAGCTTGTAACTTTTGGT 17820 

IVPTHAETGYGYIR RG ELIG 
ATTGTTCCAACGCATGCTGAAACGGGTTATGGATATATTCGTCGTGGTGAGTTGATAGGA 17880 

NDAYAVAEFVEKPDI DTAGD 
AATGACGCTTATGCAGTGGCTGAATTTGTGGAGAAACCGGATATCGATACCGCCGGTGAC 17940 

YFKSGKYYWNSGMFLFRASS 
TATTTC AAATC AGGGAAAT ATTACTGGAATAGCGGTATGTTTTTATTTCGTGCAAGCTCT 18000 



Figure 10/13 



09/423093 

WO98/S0S3I PCT/AU98/00315 
55/58 

YLN-E LKYL S-P EIYKACEKAV 
TATTTAAACGAATTAAAGTATTTATCACCTGAAATTTATAAAGCTTGTG AAAAGGCGGTA 18060 

GHINPDLDFIRIDKE EFMSC 
GGACATATAAATCCCGATCTTGATTTTATTCGTATTGATAAAGAAGAGTTTATGTCATGC 18120 

PSDS1DYAVMEHTQHAVVI P 
CCGAGTGATTCTATCGATTATGCAGTTATGGAGCACACACAGCATGCGGTGGTGATACCA 18180 

MSAGWSDVGSWSSLWDISNK 
ATGAGCGCTGGCTGGTCGGATGTGGGTTCCTGGTCCTCACTTTGGGATATATCGAATAAA 18240 

DHQRNVLKGDIFAHACNDNY 
GATCATCAGAGAAATGTTTTAAAAGGAGATATTTTCGCACATGCTTGTAATGATAATTAC 18300 

IYSEDMFISAIGVSNLVIVQ 
ATTTATTCCGAAGATATGTTTATAAGTGCGATTGGTGTAAGCAATCTTGTCATTGTTCAA 18360 

TTDALLVANKDTVQDVKKIV 
ACAACAGACGCTTTACTGGTGGCTAATAAAGATACAGTACAAGATGTTAAAAAAATTGTC 18420 

DYLKRNDRNEYKQHQEVFRP 
GATTATTTAAAACGGAATGATAGGAACGAATATAAACAACATCAAGAAGTTTTCCGCCCC 18480 

WGKYNVIDSGKNYLVRCITV 
TGGGGAAAATATAATGTGATTGATAGCGGCAAAAATTACCTCGTTCGATGTATCACTGTT 18540 

KPGEKFVAQMHHHRAEHWIV 
AAGCCGGGTGAGAAATTTGTGGCGCAGATGCATCACCACCGGGCTGAGCATTGGATAGTA 18600 

LSGTARVTKGEQTYMVSENE 
TTATCCGGGACTGCTCGTGTTACAAAGGGAGAGCAGACTTATATGGTTTCTGAAAATGAA 18660 

STFIPPNTIHALENPGMTPL 
TCAACATTTATTCCTCCGAATACTATTCACGCGCTGGAAAATCCTGGAATGACCCCCCTG 1872 0 

KLIEIQSGTYLGEDDIIRLE 
AAGTTAATTGAGATTCAATCAGGTACCTATCTTGGTGAGGATGATATTATTCGTTTAGAA 18780 

Start of mmB **** of manC 

^ „ MNVVNN SRDV 

QRSGFSKEWTNERS* 
CAACGTTCrGGATTTTCGAAGGAGTGGACTAATSAACGTAGTTAATAATAGCCGTGATGT 18840 

I Y S SGIVFGTSGARGLVKDF 
TATTTATTCATCAGGTATTGTGTTTGGAACGAGTGGGGCTCGCGGTCTTGTAAAAGATTT 18900 

TPQVCAAFTVSFVAVMQEHF 
TAC^CCTCAGGTATGTGCTGCTTTTACGGTTTCATTTGTTGCCGTTATGCAGGAAC^ 18960 

SFDTVALAIDNRPS SYGMAQ 
TTCCTTTGATACCGTAGCATTGGCAATAGATAATCGTCCAAGTAGTTATGGGATGGCTCA 19020 

ACAAALADKGVNCIFYGVVP 
GGCGTGTGCTGCTGCATTGGCGGATAAAGGCGTTAACTGTATTTTTTATGGAGTGGTACC 19080 

»JLJ! ALAF Q S MSDNMPAIMVTG 
AACCCCAGCTTTGGCCTTTCAGTCTATGTC ; rGACAATATGCCTGCGATAATGGTTACGGG 19140 

SHIPFERNGLKFYRPDGEIT 
AAGTCATATTCCATTCGAGCGGAACGGCCTCAAGTTTTATCGTCCTGATGGTGAAATCAC 19200 

KHDEAAILSVEDTCSHLELK 
GAAACATGATGAGGCTGCGATCCTTAGTGTTGAAGATACGTGCAGCCATTTAGAGCTTAA 19260 
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EL I V S EMA A V N Y I S RY' T S LF 
AGAACTCATAGTTTCAGAAATGGCTGC-TGTTAATTATATATCTCGTTATAC ATCTTTATT 19320 

STPFLKNKRIGIYEHS SAGR 
TTCTACTCCATTCCTCAAAAATAAGCGTATTGGTATTTACGAACATTCAAGCGCTGGGCG 19380 

DLYKPLFIALGAEVVSLGRS 
TGATCTTT ATAAGCCTTTATTTATTGCATTGGGGGCTGAAGTCGTT AGCTTGGGTAGAAG 19440 



DNFVP IDTEAVSKEDREKAR 
CGATAATTTTGTACCTATAGATACAGAGGCTGTAAGCAAAG AGGATCGGGAAAAAGCTCG 19500 

SYJAKEFDLDAI FSTDGDGDR 
CTC ATGGGCTAAAGAGTTCGATTTAGATGCCATATTCTCGACAGATGGGGATGGTGATCG 19560 



PLIADEAGEWLRGDILGL L. C 
CCCTCTTATTGCTGATGAGGCCGGTGAGTGGCTAAGAGGCGATATACTAGGTCTATTATG 19620 

SLALDAEAVAIPVSCNSI IS 
TTCACTTGCATTGGATGCAGAAGCCGTCGCTATTCCTGTTAGTTGTAACAGCATAATTTC 19680 

SGRFFKHVKLTKIGS PYVIE 
TTCTGGCCGCTTrTTTAAACATGTTAAGCTTACAAAAATTGGCTCGCCTTATGTTATCGA 19740 



AFNELSRSYSRIVGFEANGG 
AGC^ITTTAATGAATTATCGCGGAGTTATAGTCGTATTGTCGGTTTTGAAGCCAATGGCGG 19800 

FLLGSDICINEQNLHAL PTR 
TTTTTTATTAGGAAGCGACATCTGTATTAACGAGCAGAATCTTCATGCCTTACCAACTCG 19860 

DAVLPAIML LYKSRNTS I SA 
TGATGCTGTATTACCAGCAATAATGCTGCTTTACAAAAGTAGGAATACCAGCATTAGCGC 19920 

LVNELPTRYTHSDRLQGITT 
TTTAGTCAATGAACTCCCAACTCGTTACACCCATTCTGACAGATTACAGGGGATTACAAC 19980 

DKSQSLISMGRENLSNL LSY 
TGATAAAAGTCAATCCTTAATTAGTATGGGCAGAG AAAATCTGAGC AACCTCTTAAGCTA 2 0 0 4 0 

IGLENEGAI STDMTDGMRIT 
TATTGGTTTGGAGAATGAAGGTGCAATTTCTACAGATATGACAGATGGTATGCGAATTAC 20100 

LRDGC IVHLRASGNAPELRC 
TTTACGTGATGGATGTATTGTGC ATTTGCGCGCTTCTGGTAATGC ACCTGAGTTACGCTG 20160 



YAEANLLNRAQDLVNTTLAN 
CTATGCAGAAGCTAATTTATTAAATAGGGCTCAGGATCTTGTAAATAC AACGCTTGCTAA 20220 



IKKRCLL* 
TATTAAAAAACGATGCTTGCTG TAAAAAAATTGAATGTTATTTACTTAATATGCCTATTT 20280 



Start: of wbaP 

MDNIDNKY 
TATTTACATTATGCACGGTCAGAGGGTGAGGATTAAATGGATAATATTGATAATAAGTAT 



LWFSLGCVYF IFDQVQRFIP 
TTATGGTTTTCATTAGGATGTGTCTATTTTATTTTTGATCAAGTACAGCGATTTATTCCT 

Q.DQLDTRVI THF ILSV VCVG 
CAAGACCAATTAGATACAAGAGTTATTACGCATTTTATTTTGTCAGTAGTATGTGTCGGT 
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WFWIRLRHYTIRKPFWYELK 
TGGTTTTGGATTCGTTTGCGACATTATACTATCCGCAAGCCATTTTGGTATGAGTTAAAA 20580 

EIFRTIVIFAIFDLALIAFT 
GAAATTTTTCGTACGATCGTTATTTTTGCTATATTTGATTTGGCTCTGATAGCGTTTACA 20640 

KWQFSRYVWVFCWTFALILV 
AAATGGCAGTTTTCACGCTATGTCTGGGTGTTTTGTTGGACTTTTGCCCTAATCCTGGTG 20700 

PFFRALTKHLLNKLGIWKKK 
CCTTTTTTTCGCGCACTTACAAAGCATTTATTGAACAAGCTAGGTATCTGGAAGAAAAAA 20760 

TI ILGSGQNARGAYSALQSE 
ACTATCATCCTGGGGAGCGGACAGAATGCTCGTGGTGC ATATTCTGCGCTGCAAAGTGAG 20820 

EMMGFDVIAF FDTDAS DAEI 
GAGATGATGGGGTTTGATGTTATCGCTTTTTTTGATACGGATGCGTCAGATGCTGAAATA 20880 

NMLPVIKDTEI IWDLNRTGD 
AATATGTTCCCGGTGATAAAGGATACTGAGATTATTTGGGATTTAAATCGTACAGGTGAT 20940 

VHYILAYEYTELEKTHFWLR 
GTCCATTATATCCTTGCTTATGAATACACCGAGTTGGAGAAAACAC ATTTTTGGCTACGT 21000 

ELSKHHCRSVTVVPSFRGLP 
GAACTTTCAAAACATCATTGTCGTTCTGTTACTGTAGTCCCCTCGTTTAGAGGATTGCCA 21060 

LYK TDMSFIFSHEVML LR IQ 
TTATATAATACTGATATGTCTTTTATCrTTAGCCATGAAGTTATGTTATTAAGGATACAA 21120 

NNLAKRSSRFLKRTFDIVCS 
AATAACTTGGCTAAAAGGTCGTCCCGTTTTCTCAAACGGACATTTGATATTGTTTGTTCA 21180 

IMILI IASPLMI YLWYKVTR 
ATAATGATTCTTATAATTGCATCACCACTTATGATTTATCTGTGGTATAAAGTTACTCGA 21240 

DGGPAIYGHQRVGRHGKLFP 
GATGGTGGTCCGGCTATTTATGGTCACCAGCGAGTAGGTCGGCATGGAAAACTTTTTCCA 21300 

CYKFRSMVMNSQEVLKEL LA 
TGCTACAAATTTCGTTCTATGGTTATGAATTCTCAAGAGGTACTAAAAGAACTTTTGGCT 21360 

NDPIARAEWEKDFKLKNDPR 
AACGATCCTATTGCCAGGGCTGAATGGGAGAAAGATTTTAAACTGAAAAATGATCCTCGA 21420 

ITAVGRFIRKTSLDELPQLF 
ATCACAGCTGTAGGTCGATTTATACGTAAAACTAGCCTTGATGAGTTGCCACAACTTTTT 21480 

NVLKGDMSLVGPRPIVSDEL 
AATGTACTAAAAGGTGATATGAGCCTGGTTGGACC ACGACCTATCGTTTCGGATGAACTG 21540 

ERYCDDVDYYLMAKPGMTGL 
GAGCGTTATTGTGATGATGTTGATTATTATTTGATGGCAAAGCCGGGCATGACAGGTCTA 21600 

WQVSGRNDVDYDTRVYFDSW 
TGGCAAGTGAGTGGGCGTAATGATGTTGATTATGACACTCGTGTTTATTTTGATTCCTGG 21660 

YVKKWTLWNDIAILFKTAKV 
TATGTTAAAAACTGGACGCTTTGGAATGATATTGCCATTCTGTTTAAAACAGCGAAAGTT 2 1720 

End of wbaJP 

VLRRDGAY* 
GTTTTGCGGCGAGATGGTGCGTAT TAAGCTTACCGAGAAGTACTGAATAATAATTGTATA 21780 

AATTAGCCTGCGTAAAATCTGAACGCATCAATCGCTACCTTAATATCATACCTTTGAGTT 21840 
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AACATACTATTCACCTTTAACCTGCCATGACCGTTTGTGGCAGGGTTTCCACACCTGACA 21900 
GGAGTATGTAATGTCCAAGCAAC AGATCGGCGTCGTCGGTATGGCAGTGATGGGGCGC AA 21960 
CCTCGCGCTC AACATCGAAAGCCGTGGTTATACCGTCTCCGTTTTC AACCGCTCCCGTGA 22020 
AAAGACCGAAGAAGTGATTGCCGAGAATCCCGGCAAAAAGCTGGTGCCTTATTACACGGT 22080 
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